Home > Back-end >  How to scrape text from <div><ul><li><a>?
How to scrape text from <div><ul><li><a>?

Time:01-13

I want to scrape this website https://www.bananatic.com/es/forum/games/ As you can see, this is inside a "scrollArea" div, then there is a <ul> inside a <li>, then an <a> and a <span> I need to save the text of <a> in a variable and the number which is in <span> in another variable. That, for example, show me on the console:

Roblox

146

BigFarm

135...etc

enter image description here This is my bad code, not working correctly:

`require 'nokogiri'
require 'csv'
require 'open-uri'

link = 'https://www.bananatic.com/es/forum/games/'
pagina = URI.open(link)
datos = pagina.read
documento = Nokogiri::HTML(datos)
#p =     documento.css('.container').css('.categories').css('.scrollArea')
r = documento.css('.categories')
# print r
result = r.css('div.scrollArea > ul > li').each do |li|
   name = li.css('span').text.strip
   print name

  number = li.css('a').text.strip
  print number
end`

CodePudding user response:

The link node contains two child nodes, even though the name is not contained within a separate element (unlike the count, which is inside the span).

require "nokogiri"
require 'open-uri'
# download/cache the data (to speed up testing)
if !File.readable?("data.html")
  url = "https://www.bananatic.com/de/forum/games/"
  data = URI.open(url).read
  File.open("data.html", "wb") { |f| f << data }
end

data = File.read("data.html")
document = Nokogiri::HTML(data)
links = document.css(".categories ul li a")
result = links.map do |link|
  name, count = link.children
  [name.text.strip, count.text.to_i]
end

p result

CodePudding user response:

I was able to read the data by tweaking your code as below:

require 'nokogiri'
require 'open-uri'

link = 'https://www.bananatic.com/es/forum/games/'
pagina = URI.open(link)
datos = pagina.read
documento = Nokogiri::HTML(datos)
game_data = []
r = documento.css('div.categories')
result = r.css('div.scrollArea > ul > li').each do |li|
  game_data << {
    name:   li.child.children.first.text.strip,
    number: li.child.children.last.text.strip
  }
end

puts game_data

The li node had 2 children nodes with a child node. I have set all the data into a separate variable as game_data which is a collection of all the game data that you can print or use as per your requirement.

Hope this helps you understand the issue within the provided code.

  • Related