Extract Name Text from nested table in python using Beautiful Soup-CodePudding

I am relatively new to web scraping using Python, and I am having a lot of difficulty pulling the name value out of an HTML table row on CoinMarketCap.com. Their structure is unfamiliar to me. I have tried several methods, both on stack overflow and on other sites, to no avail. Here is a snippet of their html: https://i.stack.imgur.com/eBamV.png This is the code I currently have:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://coinmarketcap.com/rankings/exchanges/").text

soup = BeautifulSoup(page, features="html.parser")

tags = soup.findAll("div", class_="sc-16r8icm-0 sc-1teo54s-1 dNOTPP")

tables = soup.findChildren('tr')

my_table = tables[0]

rows = my_table.findChildren(['td'])

print(rows)

for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        value = cell.string
        print("the value in this cell is %s" % value)

thanks in advance for any help!

CodePudding user response：

The data you see is embedded within the page in Json form. To parse it you could use next example:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://coinmarketcap.com/rankings/exchanges/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#__NEXT_DATA__").text
data = json.loads(data)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

df = pd.json_normalize(data["props"]["initialProps"]["pageProps"]["exchange"])

print(df.head().to_markdown())

Prints:

	id	name	slug	score	countries	fiats	totalVol24h	spotVol24h	derivativesVol24h	derivativesOpenInterests	derivativesMarketPairs	totalVolChgPct24h	totalVolChgPct7d	visits	liquidity	numMarkets	numCoins	dateLaunched	lastUpdated	marketSharePct	makerFee	takerFee	rank
0	270	Binance	binance	9.9	[]	['AED', 'ARS', 'AUD', 'AZN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EGP', 'EUR', 'GBP', 'GHS', 'HKD', 'HRK', 'HUF', 'IDR', 'ILS', 'INR', 'ISK', 'JPY', 'KES', 'KRW', 'KZT', 'MXN', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'RON', 'RUB', 'SAR', 'SEK', 'SGD', 'THB', 'TRY', 'TWD', 'UAH', 'UGX', 'USD', 'UYU', 'VND', 'ZAR']	5.56801e 10	1.42812e 10	4.21641e 10	1.57537e 10	203	-20.7533	-65.7038	2.20602e 07	816	1667	394	2017-07-14T00:00:00.000Z	2022-05-17T20:08:11.000Z	0.0023	0.02	0.04	1
1	524	FTX	ftx	8.3819	[]	['USD', 'EUR', 'GBP', 'AUD', 'HKD', 'SGD', 'ZAR', 'CAD', 'CHF', 'BRL']	7.57339e 09	2.12004e 09	5.61716e 09	3.46104e 09	43	-21.1298	-58.9183	4.71841e 06	722	466	326	2019-02-25T00:00:00.000Z	2022-05-17T20:08:11.000Z	0.0003	0.02	0.07	2
2	89	Coinbase Exchange	coinbase-exchange	8.303	[]	['USD', 'EUR', 'GBP']	1.80697e 09	1.80757e 09	nan	nan	nan	-13.3741	-68.7096	2.19108e 06	717	503	173	2014-05-24T00:00:00.000Z	2022-05-17T20:08:11.000Z	0.0003	0	0	3
3	24	Kraken	kraken	7.9853	[]	['USD', 'EUR', 'GBP', 'CAD', 'JPY', 'CHF', 'AUD']	8.10391e 08	7.66352e 08	2.74902e 11	4.01852e 07	28	-14.7475	-63.5845	1.72099e 06	739	542	167	2011-07-28T00:00:00.000Z	2022-05-17T20:08:11.000Z	0.0001	0.02	0.05	4
4	311	KuCoin	kucoin	7.486	[]	['USD', 'AED', 'ARS', 'AUD', 'AGN', 'BGN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CRC', 'CZK', 'DKK', 'DOP', 'EUR', 'GBP', 'GEL', 'HKD', 'HUF', 'ILS', 'INR', 'JPY', 'KRW', 'KZT', 'MAD', 'MDL', 'MXN', 'MYR', 'NAD', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'QAR', 'RON', 'RUB', 'SEK', 'SGD', 'TRY', 'TWD', 'UAH', 'USD', 'UYU', 'UZS', 'ZAR']	5.17875e 09	1.58063e 09	3.61257e 09	9.08548e 08	112	-12.0398	-62.4081	2.55465e 06	547	1291	696	2017-08-13T00:00:00.000Z	2022-05-17T20:08:11.000Z	0.0002	0	0	5

CodePudding user response：

These sc-16r8icm-0 sc-1teo54s-1 dNOTPP are three classes separated with spaces. If you need to identify an element by multiple classes, use a selector like this

tags = soup.select("div.sc-16r8icm-0.sc-1teo54s-1.dNOTPP")