Webscraping Coinmarketcap [duplicate]-CodePudding

Due to limitations on historical data on the coinmarketcap api plans, I am seeking to webscrape instead.

However, I am stuck at the first hurdle despite reading the crummy documentation on attributes.

import json 
import requests 
from bs4 import BeautifulSoup


r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text, 'lxml')
print(soup)

Contained in the output is the data which I am trying to scrape. The data I am trying to get:

Market Cap, Price and Circulating Supply for BTC at 5th September 2021.

The data appears in the output soon after <script id="__NEXT_DATA__" type="application/json"> and for this reason I thought that using __NEXT_DATA__ as the attribute id would allow me to access the data. Unfortunately not.

An example of the data structure where the data is contained looks as follows:

"listingHistorical":{"data":[{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","num_market_pairs":8848,"date_added":"2013-04-28T00:00:00.000Z","tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio","binance-labs-portfolio","arrington-xrp-capital","blockchain-capital-portfolio","boostvc-portfolio","cms-holdings-portfolio","dcg-portfolio","dragonfly-capital-portfolio","electric-capital-portfolio","fabric-ventures-portfolio","framework-ventures","galaxy-digital-portfolio","huobi-capital","alameda-research-portfolio","a16z-portfolio","1confirmation-portfolio","winklevoss-capital","usv-portfolio","placeholder-ventures-portfolio","pantera-capital-portfolio","multicoin-capital-portfolio","paradigm-xzy-screener"],"max_supply":21000000,"circulating_supply":18807550,"total_supply":18807550,"platform":null,"cmc_rank":1,"last_updated":"2021-09-05T23:00:00.000Z","quote":{"BTC":{"price":1,"volume_24h":585906.8067215424,"percent_change_1h":0,"percent_change_24h":0,"percent_change_7d":0,"market_cap":18807550,"fully_diluted_market_cap":null,"last_updated":"2021-09-05T23:59:03.000Z"},"USD":{"price":51753.41192620951,"volume_24h":30322676318.63,"percent_change_1h":-0.159917099159,"percent_change_24h":3.621580803777,"percent_change_7d":5.987281074996,"market_cap":973354882472.7817,"last_updated":"2021-09-05T23:00:00.000Z"}},"rank":1,"noLazyLoad":true},

Is there a simply solution for this?

CodePudding user response：

This is just for the listing table, which is fully loaded on the page.

https://coinmarketcap.com/historical/20210905/ -> 20210905 -> 2021-09-05 is the date, just replace by the desired date and it will display the data https://coinmarketcap.com/historical/20210101/ for example, then scrape and extract the JSON data.

CodePudding user response：

You can try something like this:

r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text)

data = json.loads(soup.find('script', type='application/ld json', id='__NEXT_DATA__').text)

historical_data = data['listingHistorical']['data']
print historical_data