I'm trying to scrape the "Biggest Gainers" list of coins on https://coinmarketcap.com/
How do I access the nth child (Biggest Gainers) in the div class_ = 'sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy'
I managed to get the data from the "Trending" section but having trouble targeting the "Biggest Gainers" top 3 text items.
I get AttributeError: 'NoneType' object has no attribute 'p'
from bs4 import BeautifulSoup
import requests
source = requests.get('https://coinmarketcap.com/').text
soup = BeautifulSoup(source, 'lxml')
section = soup.find(class_='sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy')
#List the top 3 Gainers
for top_gainers in section.find_all(class_='sc-16r8icm-0 sc-1uagfi2-0 bdEGog sc-1rmt1nr-1 eCWTbV')[1]:
top_gainers = top_gainers.find(class_='sc-1eb5slv-0 iworPT')
top_coins = top_gainers.p.text
print(top_coins)
CodePudding user response:
I would avoid those dynamic classes and instead use -:soup-contains and combinators to first locate desired block via text, then with the combinators specify the relationship of the final elements to extract info from.
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
soup = bs(requests.get("https://coinmarketcap.com/").text, "lxml")
biggest_gainers = []
for i in soup.select(
'div[color=text]:has(span:-soup-contains("Biggest Gainers")) > div ~ div'
):
biggest_gainers.append(
{
"rank": int(i.select_one(".rank").text),
"currency": i.select_one(".alias").text,
"% change": f"{i.select_one('.icon-Caret-up').next_sibling}",
}
)
gainers = pd.DataFrame(biggest_gainers)
gainers
CodePudding user response:
As mentioned by @QHarr you should avoid dynamic identifier similar to his approach the selection comes via :-soup-contains()
and the known text of the element:
soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div')
To extract the texts I used stripped_strings
and zipped it with the keys to a dict
:
dict(zip(['rank','name','alias','change'],e.stripped_strings))
Example
from bs4 import BeautifulSoup
import requests
url = 'https://coinmarketcap.com/'
soup=BeautifulSoup(requests.get(url).content)
data = []
for e in soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div'):
data.append(dict(zip(['rank','name','alias','change'],e.stripped_strings)))
Output
[{'rank': '1', 'name': 'Tenset', 'alias': '10SET', 'change': '1406.99'},
{'rank': '2', 'name': 'Burn To Earn', 'alias': 'BTE', 'change': '348.89'},
{'rank': '3', 'name': 'MetaCars', 'alias': 'MTC', 'change': '332.05'}]
CodePudding user response:
You can use :nth-of-type
to locate the "Biggest Gainers" parent div
:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://coinmarketcap.com/').text, 'html.parser')
bg = d.select_one('div:nth-of-type(2).sc-16r8icm-0.sc-1uagfi2-0.bdEGog.sc-1rmt1nr-1.eCWTbV')
data = [{'rank':i.select_one('span.rank').text,
'name':i.select_one('p.sc-1eb5slv-0.iworPT').text,
'change':i.select_one('span.sc-27sy12-0.gLZJFn').text}
for i in bg.select('div.sc-1rmt1nr-0.sc-1rmt1nr-4.eQRTPY')]
Output:
[{'rank': '1', 'name': 'Tenset', 'change': '1308.72%'}, {'rank': '2', 'name': 'Burn To Earn', 'change': '421.82%'}, {'rank': '3', 'name': 'Aigang', 'change': '329.63%'}]