Home > Back-end >  Extracting content part within bs4.element.tag into json file
Extracting content part within bs4.element.tag into json file

Time:10-11

I already get the text within the script tag but don't know how to conduct it into JSON file with a specific dictionary. Having tried ways as conduct to str but still get the error.

import requests
import bs4
from bs4 import BeautifulSoup as BS
import html5lib
import json

url = 'https://www.economist.com/'
r = requests.get(url)

soup = BS(r.content,'html.parser')

data = soup.find('script', attrs={'type':'application/ld json'})

print(str(json.loads(str(data)))) #Ouput: Error: Expecting value: line 1 column 1 (char 0)

CodePudding user response:

Here is the working solution:

import requests
import bs4
from bs4 import BeautifulSoup as BS
import html5lib
import json

url = 'https://www.economist.com/'
r = requests.get(url)

soup = BS(r.content,'html.parser')

all_data = soup.find_all('script', attrs={'type':'application/ld json'})

for data in all_data:
     jsn = json.loads(data.string)
     print(json.dumps(jsn, indent=4))
  • Related