Home > front end >  Ruby Nokogiri take all the content
Ruby Nokogiri take all the content

Time:02-19

i'm working on a scrapping project but i got a problem:

I wanna get all the data of https://coinmarketcap.com/all/views/all/ with nokigiri but i only get 20 crypto name on the 200 loaded with nokogiri

the code:

ruby


require 'nokogiri'
require 'open-uri'
require 'rubygems'

def scrapper
    return doc = Nokogiri::HTML(URI.open('https://coinmarketcap.com/all/views/all/'))
end

def fusiontab(tab1,tab2)
    return Hash[tab1.zip(tab2)]
end

def crypto(page)
    array_name=[]
    array_value=[]
    name_of_crypto=page.xpath('//tr//td[3]')
    value_of_crypto=page.xpath('//tr//td[5]')
    hash={}
    name_of_crypto.each{ |name|
        array_name<<name.text
    }
    value_of_crypto.each{|price|
    array_value << price.text
    }
    hash=fusiontab(array_name,array_value)
    return hash
end
puts crypto(scrapper)

can you help me to get all the cryptocurrencies ?

CodePudding user response:

The URL you're using does not generate all the data as HTML; a lot of it is rendered after the page has been loaded.

Looking at the source code for the page, it appears that the data is rendered from a JSON script, embedded in the page.

it took quite some time to find the objects in order to work out what part of the JSON data has the contents that you want to work with:

  • The JSON object within the HTML, as a String object
page.css('script[type="application/json"]').first.inner_html

The JSON String converted to a real JSON Hash

JSON.parse(page.css('script[type="application/json"]').first.inner_html)

the position inside the JSON or the Array of Crypto Hashes

my_json["props"]["initialState"]["cryptocurrency"]["listingLatest"]["data"]

pretty print the first "crypto"

2.7.2 :142 > pp cryptos.first
{"id"=>1,
 "name"=>"Bitcoin",
 "symbol"=>"BTC",
 "slug"=>"bitcoin",
 "tags"=>
  ["mineable",
   "pow",
   "sha-256",
   "store-of-value",
   "state-channel",
   "coinbase-ventures-portfolio",
   "three-arrows-capital-portfolio",
   "polychain-capital-portfolio",
   "binance-labs-portfolio",
   "blockchain-capital-portfolio",
   "boostvc-portfolio",
   "cms-holdings-portfolio",
   "dcg-portfolio",
   "dragonfly-capital-portfolio",
   "electric-capital-portfolio",
   "fabric-ventures-portfolio",
   "framework-ventures-portfolio",
   "galaxy-digital-portfolio",
   "huobi-capital-portfolio",
   "alameda-research-portfolio",
   "a16z-portfolio",
   "1confirmation-portfolio",
   "winklevoss-capital-portfolio",
   "usv-portfolio",
   "placeholder-ventures-portfolio",
   "pantera-capital-portfolio",
   "multicoin-capital-portfolio",
   "paradigm-portfolio"],
 "cmcRank"=>1,
 "marketPairCount"=>9158,
 "circulatingSupply"=>18960043,
 "selfReportedCirculatingSupply"=>0,
 "totalSupply"=>18960043,
 "maxSupply"=>21000000,
 "isActive"=>1,
 "lastUpdated"=>"2022-02-16T14:26:00.000Z",
 "dateAdded"=>"2013-04-28T00:00:00.000Z",
 "quotes"=>
  [{"name"=>"USD",
    "price"=>43646.858047604175,
    "volume24h"=>20633664171.70021,
    "marketCap"=>827546305397.4712,
    "percentChange1h"=>-0.86544168,
    "percentChange24h"=>-1.6482985,
    "percentChange7d"=>-0.73945082,
    "lastUpdated"=>"2022-02-16T14:26:00.000Z",
    "percentChange30d"=>2.18336134,
    "percentChange60d"=>-6.84146969,
    "percentChange90d"=>-26.08073361,
    "fullyDilluttedMarketCap"=>916584018999.69,
    "marketCapByTotalSupply"=>827546305397.4712,
    "dominance"=>42.1276,
    "turnover"=>0.02493355,
    "ytdPriceChangePercentage"=>-8.4718}],
 "isAudited"=>false,
 "rank"=>1,
 "hasFilters"=>false,
 "quote"=>
  {"USD"=>
    {"name"=>"USD",
     "price"=>43646.858047604175,
     "volume24h"=>20633664171.70021,
     "marketCap"=>827546305397.4712,
     "percentChange1h"=>-0.86544168,
     "percentChange24h"=>-1.6482985,
     "percentChange7d"=>-0.73945082,
     "lastUpdated"=>"2022-02-16T14:26:00.000Z",
     "percentChange30d"=>2.18336134,
     "percentChange60d"=>-6.84146969,
     "percentChange90d"=>-26.08073361,
     "fullyDilluttedMarketCap"=>916584018999.69,
     "marketCapByTotalSupply"=>827546305397.4712,
     "dominance"=>42.1276,
     "turnover"=>0.02493355,
     "ytdPriceChangePercentage"=>-8.4718}}
}

the value of the first "crypto"

crypto.first["quote"]["USD"]["price"]

the key that you use in your Hash for the first "crypto"

crypto.first["symbol"]

put it all together and you get the following code (looping through each "crypto" with each_with_object)

require `json`
require 'nokogiri'
require 'open-uri'

...

def crypto(page)
  my_json = JSON.parse(page.css('script[type="application/json"]').first.inner_html)
  cryptos = my_json["props"]["initialState"]["cryptocurrency"]["listingLatest"]["data"]

  hash = cryptos.each_with_object({}) do |crypto, hsh|
    hsh[crypto["name"]] = crypto["quote"]["USD"]["price"]
  end

  return hash
end
puts crypto(scrapper);
  • Related