Home > Software design >  Python - Extracting info from website using BeautifulSoup
Python - Extracting info from website using BeautifulSoup

Time:09-11

I am new to BeautifulSoup, and I'm trying to extract data from the following website. https://excise.wb.gov.in/CHMS/Public/Page/CHMS_Public_Hospital_Bed_Availability.aspx

I am trying to extract the availability of the hospital beds information (along with the detailed breakup) after choosing a particular district and also with the 'With available bed only' option selected.

Should I choose the table, the td, the tbody, or the div class for this instance?

My current code:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://excise.wb.gov.in/CHMS/Public/Page/CHMS_Public_Hospital_Bed_Availability.aspx').text
soup = BeautifulSoup(html_text, 'lxml')
locations= soup.find('div', {'class': 'col-lg-12 col-md-12 col-sm-12'})
print(locations)

This only prints out a blank output: Output

I have also tried using tbody and from table still could not work it out. Any help would be greatly appreciated!

EDIT: Trying to find a certain element returns []. The code -

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://excise.wb.gov.in/CHMS/Public/Page/CHMS_Public_Hospital_Bed_Availability.aspx').text
soup = BeautifulSoup(html_text, 'lxml')
location = soup.find_all('h5')
print(location)

CodePudding user response:

It is probably a dynamic website, it means that when you use bs4 for retrieving data it doesn't retrieve what you see because the page updates or loads the content after the initial HTML load.

For these dynamic webpages you should use selenium and combine it with bs4.

https://selenium-python.readthedocs.io/index.html

CodePudding user response:

To find the element you require, you might want to try using CSS PATH

  • Related