Is there any way to scrap value for a webpage which fetch its vaue from API-CodePudding

I am working on a project and I have to fetch '6596626' from the source code of url= "https://www.screener.in/company/ITC/consolidated/". The value is not visible on web page making it difficult to extract using xpath. The below code is a part of page's source code which has the value which I want to extract.

   <div
     data-company-id="1552"
     data-warehouse-id="6596626"
     data-user-is-registered="true"
     data-consolidated="true"
     id="company-info">
    </div>

This was the code I tried on, I was expecting to extract the value straight from the source code but with no result.

    from urllib import request
    from bs4 import BeautifulSoup
    from lxml import etree

    symbol=input("Enter symbol of the company\n")
    response = request.urlopen("https://www.screener.in/company/" symbol "/consolidated/")
    page_source = response.read().decode('utf-8')
    soup=BeautifulSoup(page_source,'html.parser')
    id=soup.get_text('data-warehouse-id')
    print(id)

CodePudding user response：

from bs4 import BeautifulSoup
import requests


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    print(soup.select_one('#company-info')['data-warehouse-id'])


main('https://www.screener.in/company/ITC/consolidated/')

Output:

CodePudding user response：

If the value of data-warehouse-id is all you want, just get the source HTML and regex that thing out.

For example:

import re

import requests

data_id = (
    re.search(
        r'data-warehouse-id=\"(\d )\"',
        requests.get("https://www.screener.in/company/ITC/consolidated/").text,
    ).group(1)
)
print(data_id)

Output: