I'm trying to scrape Historical Bitcoin Data from coinmarketcap.com in order to get close, volume, date, high and low values since the beginning of the year until Sep 30, 2021. After going through threads and videos for hours, and I'm new to scraping with Python, I don't know what my mistake is (or is there something with the website I don't detect?). The following is my code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
closeList = []
volumeList = []
dateList = []
highList = []
lowList = []
website = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'
r = requests.get(website)
r = requests.get(website)
soup = BeautifulSoup(r.text, 'lxml')
tr = soup.find_all('tr')
FullData = []
for item in tr:
closeList.append(item.find_all('td')[4].text)
volumeList.append(item.find_all('td')[5].text)
dateList.append(item.find('td',{'style':'text-align: left;'}).text)
highList.append(item.find_all('td')[2].text)
lowList.append(item.find_all('td')[3].text)
FullData.append([closeList,volumeList,dateList,highList,lowList])
df_columns = ["close", "volume", "date", "high", "low"]
df = pd.DataFrame(FullData, columns = df_columns)
print(df)
As a result I only get:
Empty DataFrame
Columns: [close, volume, date, high, low]
Index: []
The task obliges me to scrape with BeautifulSoup and then export to csv (which obviously then is simply df.to_csv - can somebody help me out? That would be highly appreciated.
CodePudding user response:
Actually, data is loaded dynamically by javascript from api calls json response. So you can grab data easily as follows:
Code:
import requests
import json
import pandas as pd
api_url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1632441600&timeEnd=1637712000'
r = requests.get(api_url)
data = []
for item in r.json()['data']['quotes']:
close = item['quote']['close']
volume =item['quote']['volume']
date=item['quote']['timestamp']
high=item['quote']['high']
low=item['quote']['low']
data.append([close,volume,date,high,low])
cols = ["close", "volume","date","high","low"]
df = pd.DataFrame(data, columns= cols)
print(df)
#df.to_csv('info.csv',index = False)
Output:
close volume date high low
0 42839.751696 4.283935e 10 2021-09-24T23:59:59.999Z 45080.491063 40936.557169
1 42716.593147 3.160472e 10 2021-09-25T23:59:59.999Z 42996.259704 41759.920425
2 43208.539105 3.066122e 10 2021-09-26T23:59:59.999Z 43919.300970 40848.461660
3 42235.731847 3.098003e 10 2021-09-27T23:59:59.999Z 44313.245882 42190.632576
4 41034.544665 3.021494e 10 2021-09-28T23:59:59.999Z 42775.146142 40931.662500
.. ... ... ... ... ...
56 58119.576194 3.870241e 10 2021-11-19T23:59:59.999Z 58351.113266 55705.180685
57 59697.197134 3.062426e 10 2021-11-20T23:59:59.999Z 59859.880442 57469.725661
58 58730.476639 2.612345e 10 2021-11-21T23:59:59.999Z 60004.426383 58618.931432
59 56289.287323 3.503612e 10 2021-11-22T23:59:59.999Z 59266.358468 55679.840404
60 57569.074876 3.748580e 10 2021-11-23T23:59:59.999Z 57875.516397 55632.759912
[61 rows x 5 columns]