Home > front end >  Beatifulsoup output is Json not HTML, so I cannot parse it using .find methods of bs4
Beatifulsoup output is Json not HTML, so I cannot parse it using .find methods of bs4

Time:12-15

I'm trying to scrape this site. I used the following code:

import requests
import json
from bs4 import BeautifulSoup

api_url ='https://seniorcarefinder.com/Providers/List'

headers= {
    "Content-Type":"application/json; charset=utf-8",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}

body_first_page={"Services":["Independent Living","Assisted Living","Long-Term Care / Skilled Nursing","Home Care (Non-Medical)","Home Health Care (Medicare-Certified)","Hospice","Adult Day Services","Active Adult Living"],"StarRatings":[],"PageNumber":1,"Location":"Colorado Springs, CO","Geography":{"Latitude":38.833882,"Longitude":-104.821363},"ProximityInMiles":30,"SortBy":"Verified"}
res = requests.post(api_url,data=json.dumps(body_first_page),headers=headers)
soup = BeautifulSoup(res.text,'html.parser')

However, the resulting soup is in json, so I cannot parse it using .find methods of Beatifulsoup. How can I have it in the normal html, so that I can parse it using bs4 .find() and .find_all() methods?

CodePudding user response:

I'd recommend actually just using the JSON and converting that to a dict since that's basically the structure that BS4 uses for HTML.

With the json library, you can convert JSON to a dict and then use regular .get() methods to find the info you're looking for

https://www.w3schools.com/python/python_json.asp

CodePudding user response:

Why not using this structured data? Using pandas you can simply create a dataframe:

pd.DataFrame(
    requests.post(api_url,data=json.dumps(body_first_page),headers=headers)\
    .json()['Results']
)

Example

import pandas as pd
import requests
import json
api_url ='https://seniorcarefinder.com/Providers/List'

headers= {
    "Content-Type":"application/json; charset=utf-8",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}

body_first_page={"Services":["Independent Living","Assisted Living","Long-Term Care / Skilled Nursing","Home Care (Non-Medical)","Home Health Care (Medicare-Certified)","Hospice","Adult Day Services","Active Adult Living"],"StarRatings":[],"PageNumber":1,"Location":"Colorado Springs, CO","Geography":{"Latitude":38.833882,"Longitude":-104.821363},"ProximityInMiles":30,"SortBy":"Verified"}
pd.DataFrame(
    requests.post(api_url,data=json.dumps(body_first_page),headers=headers).json()['Results']
)
  • Related