I didn't know how to put the title, so it's rather long. Feel free to edit it.
I am trying to scrape data from this site, but I can't figure out how to access the individual keys and values within the 'window.data' with beautiful soup.
I'd like to for example get the value of yyuid, birthday, etc.
The code is as such:
import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re
username = "itsahardday"
url = "https://likee.video/@" username # profile url - https://likee.video/account_name
def get_profile_html():
'''
Get profile data from HTML - https://likee.video/account_name
:return:
'''
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response.read(), "html.parser")
results = soup.select_one("script:-soup-contains('userinfo')").string
print(results)
get_profile_html()
Preferable I would like to have it as JSON, but any solution is welcomed.
In advance, thank you for your help!
CodePudding user response:
tweaked your code. return from the function.
import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re
username = "itsahardday"
url = "https://likee.video/@" username # profile url - https://likee.video/account_name
def get_profile_html():
'''
Get profile data from HTML - https://likee.video/account_name
:return:
'''
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response.read(), "html.parser")
results = soup.select_one("script:-soup-contains('userinfo')").string
print(results)
return results # add return
res=get_profile_html() # save the result
then , convert to JSON
import json # import
json.loads(res.split(";")[0].split("window.data =")[1])['userinfo']