Home > Software engineering >  Web scrape second number between tags
Web scrape second number between tags

Time:03-16

I am new to Python, and never done HTML. So any help would be appreciated. I need to extract two numbers: '1062' and '348', from a website's inspect element. This is my code:

page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")

soup = BeautifulSoup(page.content, 'html.parser')

Adv = soup.select_one ('.col-sm-6 .advDec:nth-child(1)').text[10:]

Dec = soup.select_two ('.col-sm-6 .advDec:nth-child(2)').text[10:]
  

The website element looks like below:

<div >
            <div >
                <div >
                    <h4>Stocks</h4>
                </div>
                <div >
                    <p ><a href="/?pageView=nse-top-gainers" title="Click to view list of Advanced stocks">Advanced:</a> 1062</p>
                </div>
                <div >
                    <p ><a href="/?pageView=nse-top-losers" title="Click to view list of Declined stocks">Declined:</a> 348</p>
                </div>
            </div>
        </div>

Using my code, am able to extract first number (1062). But unable to extract the second number (348). Can you please help.

CodePudding user response:

Assuming the Pattern is always the same, you can select your elements by text and get its next_sibling:

adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()

Example

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content)

adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()

print(adv, dec)

CodePudding user response:

If there are always 2 elements, then the simplest way would probably be to destructure the array of selected elements.

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
soup = BeautifulSoup(page.content, "html.parser")

adv, dec = [elm.next_sibling.strip() for elm in soup.select(".advDec a") ]
print("Advanced:", adv)
print("Declined", dec)
  • Related