Home > Software engineering >  How do I get the content inside window.data with beautiful soup and jsonify it so I can choose what
How do I get the content inside window.data with beautiful soup and jsonify it so I can choose what

Time:01-03

I didn't know how to put the title, so it's rather long. Feel free to edit it.

I am trying to scrape data from this site, but I can't figure out how to access the individual keys and values within the 'window.data' with beautiful soup.

I'd like to for example get the value of yyuid, birthday, etc.

The code is as such:

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@"   username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)

get_profile_html()

Preferable I would like to have it as JSON, but any solution is welcomed.

In advance, thank you for your help!

CodePudding user response:

tweaked your code. return from the function.

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@"   username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)
    return results # add return

res=get_profile_html() # save the result

then , convert to JSON

import json # import
json.loads(res.split(";")[0].split("window.data =")[1])['userinfo']
  • Related