Home > OS >  <div><ul><li><div> SCRAPING RUBY with REGEX /w
<div><ul><li><div> SCRAPING RUBY with REGEX /w

Time:01-21

I am looking to do scraping of the website enter image description here

    require 'nokogiri'
    require 'open-uri'
    require 'pp'
    require 'csv'


    unless File.readable?('data.html')
      url = 'https://www.bananatic.com/de/forum/games/'
      data = URI.open(url).read
      File.open('data.html', 'wb') { |f| f << data }
    end
    data = File.read('data.html')
    document = Nokogiri::HTML(data)


    per = document.xpath('//div[@]/text()[string-length(normalize-space(.)) > 0]')
                  .map { |node| node.to_s[/\d /] }

    p per

    pir = document.xpath('//div[@]/text()[string-length(normalize-space(.)) > 0]')
                  .map { |node| node.to_s[/\w /] }

    p pir

    links2 = document.css('.topics ul li div')
    res = links2.map do |lk|
      name = lk.css('.name  p a').inner_text
      [name]
    end
    p res

To fix it I have added a regular expression, however I have failed in the attempt. I just replace .inner_textwith .to_s[/\w /], but I don't get it. enter image description here

  • Related