Scraping a webpage using ruby

Scraping a webpage using ruby

Scraping a webpage using ruby is very easy.

First, install the ‘nokogiri’ gem by adding it to your gemfile or by executing the following command:

gem install nokogiri

Then, in your controller, open the webpage you want to scrap:

require 'open-uri'

page = Nokogiri::HTML(open("http://webpage.com"))

Once you have opened the webpage, you are ready to access the content.

Given the following html:

</div>
<div class="term">
    <div class="word">Scraping</div>
    <div class="definition">is a technique of extracting information from websites</div>
</div>
<div class="term">
....

You can get the complete list of terms using the “search” method:

 terms = page.search('.term')

 terms.each do |t|
     word = t.search('.word').first.content
     definition = t.search('.definition').first.content
 
     puts word
     puts definition
end

There are several other ways and methods to  help you scrap a webpage using nokogiri. Check the complete nokogiri tutorials to know more.

 

You can also get the working example from https://github.com/andresiglesias/scraper

Leave a Reply

Your email address will not be published. Required fields are marked *