Use Python to scrape a table from a website-CodePudding

I am here to ask for help on how to write a program to capture the value 10,599 from below site.

https://www.immd.gov.hk/eng/stat_20221001.html

The element that contains the value that I want to capture is as follows.

<td headers="Total_Departure" data-label="Total">10,599</td>

Thanks for the help in advance!

I am fairly new to python and recently I have just been enlightened that I can use selenium or bs4 to scrape a website onto a table. After researching many websites and watching countless youtube videos, my effort is still in vain.

CodePudding user response：

You can use this locator:

.//td[@headers='Control_Point' and text()='Total']//parent::tr//td[@headers='Total_Departure']

CodePudding user response：

Using Beautifulsoup to find tags with certain attributes

Code

import requests
from bs4 import BeautifulSoup

url = 'https://www.immd.gov.hk/eng/stat_20221001.html'
req = requests.get(url)

if req.ok:
    soup = BeautifulSoup(req.content, 'html5lib')  # or lxml by default instead of html5lib
    # Many cells have desired attributes.  We take the last one.
    table_cell = soup.find_all('td', attrs={"class":"hRight", 
                                            "data-label":"Total",
                                            "headers":"Total_Departure"}
                               )[-1]  # -1 for last cell in list found
    
    print(table_cell.text)     # print table cell text
else:
    raise Exception("Sorry, error with url request")

Output

10,599