I am trying to scrape data from all the 37 web pages from this website.
The website I am scrapping doesn't allow going to the next page through the search bar.
This is the HTML written for the next button.
<a href="javascript:void('Next')" >
<svg viewBox="0 0 36 36" data-use="/cms/svg/site/icon_caret_right.36.svg">
(path tag and data)
</svg>
</a>
I know that this can be done with Selenium, but is there any way to do this with BeautifulSoup?
Is there any way to scrape data from the next page?
CodePudding user response:
So you can go to each page using requests
here. It's through a post request, that then uses the query page parameter to get back the data for sequential pages:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.stfrancismedicalcenter.com/find-a-provider/'
for page in range(1, 38):
print(f'\t\tPage: {page}')
payload = {
'_m_': 'FindAPhysician',
'PhysicianSearch$HDR0$PhysicianName': '',
'PhysicianSearch$HDR0$SpecialtyIDs': '',
'PhysicianSearch$HDR0$Distance': '5',
'PhysicianSearch$HDR0$ZipCodeSearch': '',
'PhysicianSearch$HDR0$Keywords': '',
'PhysicianSearch$HDR0$LanguageIDs': '',
'PhysicianSearch$HDR0$Gender': '',
'PhysicianSearch$HDR0$InsuranceIDs': '',
'PhysicianSearch$HDR0$AffiliationIDs': '',
'PhysicianSearch$HDR0$NewPatientsOnly': '',
'PhysicianSearch$HDR0$InNetwork': '',
'PhysicianSearch$HDR0$HasPhoto': '',
'PhysicianSearch$FTR01$PagingID': str(page)}
response = requests.post(url, data=payload)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('li', {'class':re.compile("^half item-")})
for item in items:
itemName = item.find('div', {'class':'info'}).find_all('span')[0].text
itemType = item.find('div', {'class':'info'}).find_all('span')[1].text
phone = item.find('li', {'class':'inline-svg phone'}).text.strip()
address = item.find('address').text.strip().replace('\t','')
print(f'\n{itemName}\n{itemType}\n{phone}\n{address}\n')