Sunday, 25 February 2018

RUBY Nokogiri CSS HTML Parsing



I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following:




CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
CCC2-$7.00




where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format:





CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
CCC1-$7.00
CCC2-$5.00
CCC2-$10.00
CCC2-$15.00
CCC2-$7.00




Any help would be appreciated.



require 'rubygems'  
require 'nokogiri'
require 'open-uri'


doc = Nokogiri::HTML.parse(<<-eohtml)










CCC1








$5.00
$10.00
$15.00










CCC2






$7.00



eohtml


doc.css('td.BBB > span.CCC').each do |something|
doc.css('tr > td.EEE, tr > td.FFF').each do |something_more|
puts something.content + '-'+ something_more.content
end
end

Answer



How about this?



doc.css('td.BBB > span.CCC').each do |something|

something.parent.parent.css('tr > td.EEE, tr > td.FFF').each do |something_more|
puts something.content + '-'+ something_more.content
end
end

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...