Home > Blockchain >  beautifulsoup wont let me use the find_all() command
beautifulsoup wont let me use the find_all() command

Time:11-13

HTML source code I am working on an independent project where I want to scrape all historical data from a cryptocurrency and store in a python pandas df. I have identified the structure of the html page, and have the following code

from bs4 import BeautifulSoup
import urllib3
import requests
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


bitcoin_df = pd.DataFrame(columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap'])

bitcoin_url = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
bitcoin_content = requests.get(bitcoin_url).text
bitcoin_soup = BeautifulSoup(bitcoin_content, "lxml")
#print(bitcoin_soup.prettify())

bitcoin_table = bitcoin_soup.find("table", attrs={"class": "h7vnx2-2 hLKazY cmc-table  "})
bitcoin_table_data = bitcoin_table.find_all("tr")

for tr in bitcoin_table_data:
    tds = tr.find_all("td")
    for td in tds:
        bitcoin_df.append({'Date': td[0].text, 'Open': td[1].text, 'High': td[2].text, 'Low': td[3].text, 'Close': td[4].text, 'Volume': td[5].text, 'Market Cap': td[6].text})

However, I encounter this error:

>AttributeError                            Traceback (most recent call last)
<ipython-input-46-316341b6771b> in <module>
      7 
      8 bitcoin_table = bitcoin_soup.find("table", attrs={"class": "h7vnx2-2 hLKazY cmc-table  "})
----> 9 bitcoin_table_data = bitcoin_table.find_all("tr")
     10 
     11 #for tr in bitcoin_soup.find_all('tr'):
>AttributeError: 'NoneType' object has no attribute 'find_all'

CodePudding user response:

You are getting that error because the .find() called returned None to indicate it could not locate the table. The table is created by Javascript inside a browser so will not be present.

Rather than trying to parse the HTML, you could just request the data directly from their API (as the browser does). For example:

import pandas as pd
import requests
import time

ts = int(time.time())
json_url = f"https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart={ts - 5270400}&timeEnd={ts}"
json_req = requests.get(json_url)
json_data = json_req.json()
                                                            
data = []

for quote in json_data['data']['quotes']:
    data.append([
        quote['quote']['timestamp'],
        quote['quote']['open'],
        quote['quote']['high'],
        quote['quote']['low'],
        quote['quote']['close'],
        quote['quote']['volume'],
        quote['quote']['marketCap'],
    ])
    
df = pd.DataFrame(data, columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap'])
print(df)

Which would give you a dataframe starting:

                        Date          Open          High           Low         Close        Volume    Market Cap
0   2021-09-13T23:59:59.999Z  46057.215327  46598.678985  43591.320785  44963.072633  4.096994e 10  8.459805e 11
1   2021-09-14T23:59:59.999Z  44960.049359  47218.125355  44752.331349  47092.493833  3.865215e 10  8.860953e 11
2   2021-09-15T23:59:59.999Z  47097.998123  48450.468466  46773.326543  48176.346393  3.048450e 10  9.065325e 11

This URL was found by watching the browser request the data using its own developer tools. I suggest you print(json_data) to see what was returned.

  • Related