Home > front end >  how to split in ruby-scraping-ERROR undefined method
how to split in ruby-scraping-ERROR undefined method

Time:01-19

I am scraping the website enter image description here But I only need the year I have made a split but I get an error. enter image description here

enter image description here

CodePudding user response:

You could use REGEX to get only the year after having the list.

Of course, if what you showing is the pattern. Will work. Years would be the only one with 4 straight digits.

Example: 17.01.2023, 17:40 this \b\d{4}\b will result in 2023.

CodePudding user response:

Your issue is that if you look at the output from this block

re = links3.map do |lk3|
  lk3.css('.name').children.text.strip.split("\n")[2]
end

You will see:

["              07.08.2016, 13:47", nil, nil, nil, nil, "              06.08.2016, 9:24", nil, nil, nil, nil,...]

So you could solve your immediate issue by just adding .compact to the end

That being said here is another way to solve your issue:

You can get just the year from that text on that page using the following:

require 'nokogiri'
require 'open-uri'

url = "https://www.bananatic.com/de/forum/games/"

doc = Nokogiri::HTML(URI.open(url))

doc
  .xpath('//div[@]/text()[string-length(normalize-space(.)) > 0]')
  .map {|node| node.to_s[/\d{4}/]}
#=> ["2016", "2016", "2022", "2022", "2022", "2021", "2022", "2017", "2022", "2021", "2019", "2016", "2021", "2021", "2021", "2021", "2020", "2021", "2017", "2021"]

The 2 parts are:

  1. //div[@]/text()[string-length(normalize-space(.)) > 0] - the XPath which finds all divs with the class "name" and then pulls the non zero length (trimmed of white space) text nodes.
  2. .map {|node| node.to_s[/\d{4}/]} - map these into an array by slicing the String based on a regex for 4 contiguous digits.

If you would like the XPath to be as specific as your post you can use:

'//div[@]/ul/li//div[@]/text()[string-length(normalize-space(.)) > 0]'
  • Related