I am working on a project and I have to fetch '6596626' from the source code of url= "https://www.screener.in/company/ITC/consolidated/". The value is not visible on web page making it difficult to extract using xpath. The below code is a part of page's source code which has the value which I want to extract.
<div
data-company-id="1552"
data-warehouse-id="6596626"
data-user-is-registered="true"
data-consolidated="true"
id="company-info">
</div>
This was the code I tried on, I was expecting to extract the value straight from the source code but with no result.
from urllib import request
from bs4 import BeautifulSoup
from lxml import etree
symbol=input("Enter symbol of the company\n")
response = request.urlopen("https://www.screener.in/company/" symbol "/consolidated/")
page_source = response.read().decode('utf-8')
soup=BeautifulSoup(page_source,'html.parser')
id=soup.get_text('data-warehouse-id')
print(id)
CodePudding user response:
from bs4 import BeautifulSoup
import requests
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
print(soup.select_one('#company-info')['data-warehouse-id'])
main('https://www.screener.in/company/ITC/consolidated/')
Output:
6596626
CodePudding user response:
If the value of data-warehouse-id
is all you want, just get the source HTML
and regex that thing out.
For example:
import re
import requests
data_id = (
re.search(
r'data-warehouse-id=\"(\d )\"',
requests.get("https://www.screener.in/company/ITC/consolidated/").text,
).group(1)
)
print(data_id)
Output:
6596626