Home > Net >  Scrape Historical Bitcoin Data from Coinmarketcap with BeautifulSoup
Scrape Historical Bitcoin Data from Coinmarketcap with BeautifulSoup

Time:11-25

I'm trying to scrape Historical Bitcoin Data from coinmarketcap.com in order to get close, volume, date, high and low values since the beginning of the year until Sep 30, 2021. After going through threads and videos for hours, and I'm new to scraping with Python, I don't know what my mistake is (or is there something with the website I don't detect?). The following is my code:

from bs4 import BeautifulSoup
import requests
import pandas as pd


closeList = []
volumeList = []
dateList = []
highList = []
lowList = []

website = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'

r = requests.get(website)

r = requests.get(website)
soup = BeautifulSoup(r.text, 'lxml')

tr = soup.find_all('tr')
FullData = []
for item in tr:
    closeList.append(item.find_all('td')[4].text)
    volumeList.append(item.find_all('td')[5].text)
    dateList.append(item.find('td',{'style':'text-align: left;'}).text)
    highList.append(item.find_all('td')[2].text)
    lowList.append(item.find_all('td')[3].text)
    FullData.append([closeList,volumeList,dateList,highList,lowList])

df_columns = ["close", "volume", "date", "high", "low"]

df = pd.DataFrame(FullData, columns = df_columns)
print(df)

As a result I only get:

Empty DataFrame
Columns: [close, volume, date, high, low]
Index: []

The task obliges me to scrape with BeautifulSoup and then export to csv (which obviously then is simply df.to_csv - can somebody help me out? That would be highly appreciated.

CodePudding user response:

Actually, data is loaded dynamically by javascript from api calls json response. So you can grab data easily as follows:

Code:

import requests
import json
import pandas as pd
api_url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1632441600&timeEnd=1637712000'
r = requests.get(api_url)
data = []
for item in r.json()['data']['quotes']:
    close = item['quote']['close']
    volume =item['quote']['volume']
    date=item['quote']['timestamp']
    high=item['quote']['high']
    low=item['quote']['low']
    data.append([close,volume,date,high,low])


cols = ["close", "volume","date","high","low"]

df = pd.DataFrame(data, columns= cols)
print(df)
#df.to_csv('info.csv',index = False)

Output:

           close        volume                      date          high           low
0   42839.751696  4.283935e 10  2021-09-24T23:59:59.999Z  45080.491063  40936.557169
1   42716.593147  3.160472e 10  2021-09-25T23:59:59.999Z  42996.259704  41759.920425
2   43208.539105  3.066122e 10  2021-09-26T23:59:59.999Z  43919.300970  40848.461660
3   42235.731847  3.098003e 10  2021-09-27T23:59:59.999Z  44313.245882  42190.632576
4   41034.544665  3.021494e 10  2021-09-28T23:59:59.999Z  42775.146142  40931.662500
..           ...           ...                       ...           ...           ...
56  58119.576194  3.870241e 10  2021-11-19T23:59:59.999Z  58351.113266  55705.180685
57  59697.197134  3.062426e 10  2021-11-20T23:59:59.999Z  59859.880442  57469.725661
58  58730.476639  2.612345e 10  2021-11-21T23:59:59.999Z  60004.426383  58618.931432
59  56289.287323  3.503612e 10  2021-11-22T23:59:59.999Z  59266.358468  55679.840404
60  57569.074876  3.748580e 10  2021-11-23T23:59:59.999Z  57875.516397  55632.759912

[61 rows x 5 columns]
  • Related