I am relatively new to web scraping using Python, and I am having a lot of difficulty pulling the name value out of an HTML table row on CoinMarketCap.com. Their structure is unfamiliar to me. I have tried several methods, both on stack overflow and on other sites, to no avail. Here is a snippet of their html: https://i.stack.imgur.com/eBamV.png This is the code I currently have:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://coinmarketcap.com/rankings/exchanges/").text
soup = BeautifulSoup(page, features="html.parser")
tags = soup.findAll("div", class_="sc-16r8icm-0 sc-1teo54s-1 dNOTPP")
tables = soup.findChildren('tr')
my_table = tables[0]
rows = my_table.findChildren(['td'])
print(rows)
for row in rows:
cells = row.findChildren('td')
for cell in cells:
value = cell.string
print("the value in this cell is %s" % value)
thanks in advance for any help!
CodePudding user response:
The data you see is embedded within the page in Json form. To parse it you could use next example:
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://coinmarketcap.com/rankings/exchanges/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#__NEXT_DATA__").text
data = json.loads(data)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
df = pd.json_normalize(data["props"]["initialProps"]["pageProps"]["exchange"])
print(df.head().to_markdown())
Prints:
id | name | slug | score | countries | fiats | totalVol24h | spotVol24h | derivativesVol24h | derivativesOpenInterests | derivativesMarketPairs | totalVolChgPct24h | totalVolChgPct7d | visits | liquidity | numMarkets | numCoins | dateLaunched | lastUpdated | marketSharePct | type | makerFee | takerFee | rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 270 | Binance | binance | 9.9 | [] | ['AED', 'ARS', 'AUD', 'AZN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EGP', 'EUR', 'GBP', 'GHS', 'HKD', 'HRK', 'HUF', 'IDR', 'ILS', 'INR', 'ISK', 'JPY', 'KES', 'KRW', 'KZT', 'MXN', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'RON', 'RUB', 'SAR', 'SEK', 'SGD', 'THB', 'TRY', 'TWD', 'UAH', 'UGX', 'USD', 'UYU', 'VND', 'ZAR'] | 5.56801e 10 | 1.42812e 10 | 4.21641e 10 | 1.57537e 10 | 203 | -20.7533 | -65.7038 | 2.20602e 07 | 816 | 1667 | 394 | 2017-07-14T00:00:00.000Z | 2022-05-17T20:08:11.000Z | 0.0023 | 0.02 | 0.04 | 1 | |
1 | 524 | FTX | ftx | 8.3819 | [] | ['USD', 'EUR', 'GBP', 'AUD', 'HKD', 'SGD', 'ZAR', 'CAD', 'CHF', 'BRL'] | 7.57339e 09 | 2.12004e 09 | 5.61716e 09 | 3.46104e 09 | 43 | -21.1298 | -58.9183 | 4.71841e 06 | 722 | 466 | 326 | 2019-02-25T00:00:00.000Z | 2022-05-17T20:08:11.000Z | 0.0003 | 0.02 | 0.07 | 2 | |
2 | 89 | Coinbase Exchange | coinbase-exchange | 8.303 | [] | ['USD', 'EUR', 'GBP'] | 1.80697e 09 | 1.80757e 09 | nan | nan | nan | -13.3741 | -68.7096 | 2.19108e 06 | 717 | 503 | 173 | 2014-05-24T00:00:00.000Z | 2022-05-17T20:08:11.000Z | 0.0003 | 0 | 0 | 3 | |
3 | 24 | Kraken | kraken | 7.9853 | [] | ['USD', 'EUR', 'GBP', 'CAD', 'JPY', 'CHF', 'AUD'] | 8.10391e 08 | 7.66352e 08 | 2.74902e 11 | 4.01852e 07 | 28 | -14.7475 | -63.5845 | 1.72099e 06 | 739 | 542 | 167 | 2011-07-28T00:00:00.000Z | 2022-05-17T20:08:11.000Z | 0.0001 | 0.02 | 0.05 | 4 | |
4 | 311 | KuCoin | kucoin | 7.486 | [] | ['USD', 'AED', 'ARS', 'AUD', 'AGN', 'BGN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CRC', 'CZK', 'DKK', 'DOP', 'EUR', 'GBP', 'GEL', 'HKD', 'HUF', 'ILS', 'INR', 'JPY', 'KRW', 'KZT', 'MAD', 'MDL', 'MXN', 'MYR', 'NAD', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'QAR', 'RON', 'RUB', 'SEK', 'SGD', 'TRY', 'TWD', 'UAH', 'USD', 'UYU', 'UZS', 'ZAR'] | 5.17875e 09 | 1.58063e 09 | 3.61257e 09 | 9.08548e 08 | 112 | -12.0398 | -62.4081 | 2.55465e 06 | 547 | 1291 | 696 | 2017-08-13T00:00:00.000Z | 2022-05-17T20:08:11.000Z | 0.0002 | 0 | 0 | 5 |
CodePudding user response:
These sc-16r8icm-0 sc-1teo54s-1 dNOTPP
are three classes separated with spaces. If you need to identify an element by multiple classes, use a selector like this
tags = soup.select("div.sc-16r8icm-0.sc-1teo54s-1.dNOTPP")