i'm working on a scrapping project but i got a problem:
I wanna get all the data of https://coinmarketcap.com/all/views/all/ with nokigiri but i only get 20 crypto name on the 200 loaded with nokogiri
the code:
ruby
require 'nokogiri'
require 'open-uri'
require 'rubygems'
def scrapper
return doc = Nokogiri::HTML(URI.open('https://coinmarketcap.com/all/views/all/'))
end
def fusiontab(tab1,tab2)
return Hash[tab1.zip(tab2)]
end
def crypto(page)
array_name=[]
array_value=[]
name_of_crypto=page.xpath('//tr//td[3]')
value_of_crypto=page.xpath('//tr//td[5]')
hash={}
name_of_crypto.each{ |name|
array_name<<name.text
}
value_of_crypto.each{|price|
array_value << price.text
}
hash=fusiontab(array_name,array_value)
return hash
end
puts crypto(scrapper)
can you help me to get all the cryptocurrencies ?
CodePudding user response:
The URL you're using does not generate all the data as HTML; a lot of it is rendered after the page has been loaded.
Looking at the source code for the page, it appears that the data is rendered from a JSON script, embedded in the page.
it took quite some time to find the objects in order to work out what part of the JSON data has the contents that you want to work with:
- The JSON object within the HTML, as a
String
object
page.css('script[type="application/json"]').first.inner_html
The JSON String
converted to a real JSON Hash
JSON.parse(page.css('script[type="application/json"]').first.inner_html)
the position inside the JSON or the Array
of Crypto Hash
es
my_json["props"]["initialState"]["cryptocurrency"]["listingLatest"]["data"]
pretty print the first "crypto"
2.7.2 :142 > pp cryptos.first
{"id"=>1,
"name"=>"Bitcoin",
"symbol"=>"BTC",
"slug"=>"bitcoin",
"tags"=>
["mineable",
"pow",
"sha-256",
"store-of-value",
"state-channel",
"coinbase-ventures-portfolio",
"three-arrows-capital-portfolio",
"polychain-capital-portfolio",
"binance-labs-portfolio",
"blockchain-capital-portfolio",
"boostvc-portfolio",
"cms-holdings-portfolio",
"dcg-portfolio",
"dragonfly-capital-portfolio",
"electric-capital-portfolio",
"fabric-ventures-portfolio",
"framework-ventures-portfolio",
"galaxy-digital-portfolio",
"huobi-capital-portfolio",
"alameda-research-portfolio",
"a16z-portfolio",
"1confirmation-portfolio",
"winklevoss-capital-portfolio",
"usv-portfolio",
"placeholder-ventures-portfolio",
"pantera-capital-portfolio",
"multicoin-capital-portfolio",
"paradigm-portfolio"],
"cmcRank"=>1,
"marketPairCount"=>9158,
"circulatingSupply"=>18960043,
"selfReportedCirculatingSupply"=>0,
"totalSupply"=>18960043,
"maxSupply"=>21000000,
"isActive"=>1,
"lastUpdated"=>"2022-02-16T14:26:00.000Z",
"dateAdded"=>"2013-04-28T00:00:00.000Z",
"quotes"=>
[{"name"=>"USD",
"price"=>43646.858047604175,
"volume24h"=>20633664171.70021,
"marketCap"=>827546305397.4712,
"percentChange1h"=>-0.86544168,
"percentChange24h"=>-1.6482985,
"percentChange7d"=>-0.73945082,
"lastUpdated"=>"2022-02-16T14:26:00.000Z",
"percentChange30d"=>2.18336134,
"percentChange60d"=>-6.84146969,
"percentChange90d"=>-26.08073361,
"fullyDilluttedMarketCap"=>916584018999.69,
"marketCapByTotalSupply"=>827546305397.4712,
"dominance"=>42.1276,
"turnover"=>0.02493355,
"ytdPriceChangePercentage"=>-8.4718}],
"isAudited"=>false,
"rank"=>1,
"hasFilters"=>false,
"quote"=>
{"USD"=>
{"name"=>"USD",
"price"=>43646.858047604175,
"volume24h"=>20633664171.70021,
"marketCap"=>827546305397.4712,
"percentChange1h"=>-0.86544168,
"percentChange24h"=>-1.6482985,
"percentChange7d"=>-0.73945082,
"lastUpdated"=>"2022-02-16T14:26:00.000Z",
"percentChange30d"=>2.18336134,
"percentChange60d"=>-6.84146969,
"percentChange90d"=>-26.08073361,
"fullyDilluttedMarketCap"=>916584018999.69,
"marketCapByTotalSupply"=>827546305397.4712,
"dominance"=>42.1276,
"turnover"=>0.02493355,
"ytdPriceChangePercentage"=>-8.4718}}
}
the value of the first "crypto"
crypto.first["quote"]["USD"]["price"]
the key that you use in your Hash
for the first "crypto"
crypto.first["symbol"]
put it all together and you get the following code (looping through each "crypto" with each_with_object
)
require `json`
require 'nokogiri'
require 'open-uri'
...
def crypto(page)
my_json = JSON.parse(page.css('script[type="application/json"]').first.inner_html)
cryptos = my_json["props"]["initialState"]["cryptocurrency"]["listingLatest"]["data"]
hash = cryptos.each_with_object({}) do |crypto, hsh|
hsh[crypto["name"]] = crypto["quote"]["USD"]["price"]
end
return hash
end
puts crypto(scrapper);