
Scraping a webpage using ruby
Scraping a webpage using ruby is very easy.
First, install the ‘nokogiri’ gem by adding it to your gemfile or by executing the following command:
gem install nokogiri
Then, in your controller, open the webpage you want to scrap:
require 'open-uri' page = Nokogiri::HTML(open("http://webpage.com"))
Once you have opened the webpage, you are ready to access the content.
Given the following html:
</div> <div class="term"> <div class="word">Scraping</div> <div class="definition">is a technique of extracting information from websites</div> </div> <div class="term"> ....
You can get the complete list of terms using the “search” method:
terms = page.search('.term') terms.each do |t| word = t.search('.word').first.content definition = t.search('.definition').first.content puts word puts definition end
There are several other ways and methods to help you scrap a webpage using nokogiri. Check the complete nokogiri tutorials to know more.
You can also get the working example from https://github.com/andresiglesias/scraper