I want to scrape this website
As you can see, this is inside a "scrollArea" div, then there is a <ul>
inside a <li>
, then an <a>
and a <span>
I need to save the text of <a>
in a variable and the number which is in <span>
in another variable. That, for example, show me on the console:
Roblox
146
BigFarm
135...etc
This is my bad code, not working correctly:
`require 'nokogiri'
require 'csv'
require 'open-uri'
link = 'https://www.bananatic.com/es/forum/games/'
pagina = URI.open(link)
datos = pagina.read
documento = Nokogiri::HTML(datos)
#p = documento.css('.container').css('.categories').css('.scrollArea')
r = documento.css('.categories')
# print r
result = r.css('div.scrollArea > ul > li').each do |li|
name = li.css('span').text.strip
print name
number = li.css('a').text.strip
print number
end`
CodePudding user response:
The link node contains two child nodes, even though the name is not contained within a separate element (unlike the count, which is inside the span).
require "nokogiri"
require 'open-uri'
# download/cache the data (to speed up testing)
if !File.readable?("data.html")
url = "https://www.bananatic.com/de/forum/games/"
data = URI.open(url).read
File.open("data.html", "wb") { |f| f << data }
end
data = File.read("data.html")
document = Nokogiri::HTML(data)
links = document.css(".categories ul li a")
result = links.map do |link|
name, count = link.children
[name.text.strip, count.text.to_i]
end
p result
CodePudding user response:
I was able to read the data by tweaking your code as below:
require 'nokogiri'
require 'open-uri'
link = 'https://www.bananatic.com/es/forum/games/'
pagina = URI.open(link)
datos = pagina.read
documento = Nokogiri::HTML(datos)
game_data = []
r = documento.css('div.categories')
result = r.css('div.scrollArea > ul > li').each do |li|
game_data << {
name: li.child.children.first.text.strip,
number: li.child.children.last.text.strip
}
end
puts game_data
The li
node had 2 children nodes with a child node.
I have set all the data into a separate variable as game_data
which is a collection of all the game data that you can print or use as per your requirement.
Hope this helps you understand the issue within the provided code.