I am here to ask for help on how to write a program to capture the value 10,599 from below site.
https://www.immd.gov.hk/eng/stat_20221001.html
The element that contains the value that I want to capture is as follows.
<td headers="Total_Departure" data-label="Total">10,599</td>
Thanks for the help in advance!
I am fairly new to python and recently I have just been enlightened that I can use selenium or bs4 to scrape a website onto a table. After researching many websites and watching countless youtube videos, my effort is still in vain.
CodePudding user response:
You can use this locator:
.//td[@headers='Control_Point' and text()='Total']//parent::tr//td[@headers='Total_Departure']
CodePudding user response:
Using Beautifulsoup to find tags with certain attributes
Code
import requests
from bs4 import BeautifulSoup
url = 'https://www.immd.gov.hk/eng/stat_20221001.html'
req = requests.get(url)
if req.ok:
soup = BeautifulSoup(req.content, 'html5lib') # or lxml by default instead of html5lib
# Many cells have desired attributes. We take the last one.
table_cell = soup.find_all('td', attrs={"class":"hRight",
"data-label":"Total",
"headers":"Total_Departure"}
)[-1] # -1 for last cell in list found
print(table_cell.text) # print table cell text
else:
raise Exception("Sorry, error with url request")
Output
10,599