The target is to extract a MP4 video link on MLB website.
url ="https://www.mlb.com/video/jeremy-pena-s-solo-homer?t=most-popular"
content = requests.get(url).text
I have found the target dict.
soup = BeautifulSoup(content,"lxml")
all_script_label = soup.find_all(name ="script")
target = all_script_label[20].text.split("\n")[1].split("=")[1]
But I can't turn the target into dict type with json.loads, it's still a string.
json_ob = json.loads(target)
print(type(json_ob))
Which step I did wrong?
I have tried ast.literal_eval method but it doesn't work too.
CodePudding user response:
You can apply json.loads
second time to convert the str
to dict
:
import re
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.mlb.com/video/jeremy-pena-s-solo-homer?t=most-popular"
content = requests.get(url).text
soup = BeautifulSoup(content, "lxml")
all_script_label = soup.find_all(name="script")
target = all_script_label[20].text
data = re.search(r"window\.__VIDEO_INIT_STATE__ = (.*)", target).group(1)
data = json.loads(json.loads(data))
print(type(data))
Prints:
<class 'dict'>