Home > Software design >  Python -BeautifulSoup - How to target nth child and print the text
Python -BeautifulSoup - How to target nth child and print the text

Time:06-11

I'm trying to scrape the "Biggest Gainers" list of coins on https://coinmarketcap.com/

How do I access the nth child (Biggest Gainers) in the div class_ = 'sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy'

I managed to get the data from the "Trending" section but having trouble targeting the "Biggest Gainers" top 3 text items.

I get AttributeError: 'NoneType' object has no attribute 'p'

from bs4 import BeautifulSoup
import requests


source = requests.get('https://coinmarketcap.com/').text

soup = BeautifulSoup(source, 'lxml')

section = soup.find(class_='sc-1rmt1nr-0 sc-1rmt1nr-2 iMyvIy')

#List the top 3 Gainers 
for top_gainers in section.find_all(class_='sc-16r8icm-0 sc-1uagfi2-0 bdEGog sc-1rmt1nr-1 eCWTbV')[1]:
    top_gainers = top_gainers.find(class_='sc-1eb5slv-0 iworPT')
    top_coins = top_gainers.p.text
    print(top_coins)

CodePudding user response:

I would avoid those dynamic classes and instead use -:soup-contains and combinators to first locate desired block via text, then with the combinators specify the relationship of the final elements to extract info from.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

soup = bs(requests.get("https://coinmarketcap.com/").text, "lxml")
biggest_gainers = []

for i in soup.select(
    'div[color=text]:has(span:-soup-contains("Biggest Gainers")) > div ~ div'
):
    biggest_gainers.append(
        {
            "rank": int(i.select_one(".rank").text),
            "currency": i.select_one(".alias").text,
            "% change": f"{i.select_one('.icon-Caret-up').next_sibling}",
        }
    )

gainers = pd.DataFrame(biggest_gainers)
gainers

CodePudding user response:

As mentioned by @QHarr you should avoid dynamic identifier similar to his approach the selection comes via :-soup-contains() and the known text of the element:

soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div')

To extract the texts I used stripped_strings and zipped it with the keys to a dict:

dict(zip(['rank','name','alias','change'],e.stripped_strings))
Example
from bs4 import BeautifulSoup
import requests

url = 'https://coinmarketcap.com/'
soup=BeautifulSoup(requests.get(url).content)
data = []
for e in soup.select('div:has(>div>span:-soup-contains("Biggest Gainers")) ~ div'):
    data.append(dict(zip(['rank','name','alias','change'],e.stripped_strings)))
Output
[{'rank': '1', 'name': 'Tenset', 'alias': '10SET', 'change': '1406.99'},
 {'rank': '2', 'name': 'Burn To Earn', 'alias': 'BTE', 'change': '348.89'},
 {'rank': '3', 'name': 'MetaCars', 'alias': 'MTC', 'change': '332.05'}]

CodePudding user response:

You can use :nth-of-type to locate the "Biggest Gainers" parent div:

import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://coinmarketcap.com/').text, 'html.parser')
bg = d.select_one('div:nth-of-type(2).sc-16r8icm-0.sc-1uagfi2-0.bdEGog.sc-1rmt1nr-1.eCWTbV')
data = [{'rank':i.select_one('span.rank').text, 
         'name':i.select_one('p.sc-1eb5slv-0.iworPT').text,
          'change':i.select_one('span.sc-27sy12-0.gLZJFn').text}
        for i in bg.select('div.sc-1rmt1nr-0.sc-1rmt1nr-4.eQRTPY')]

Output:

[{'rank': '1', 'name': 'Tenset', 'change': '1308.72%'}, {'rank': '2', 'name': 'Burn To Earn', 'change': '421.82%'}, {'rank': '3', 'name': 'Aigang', 'change': '329.63%'}]
  • Related