I have tried using :
from json import loads
from requests import get
text_inside_body_tag = loads(get('https://Clicker-leaderboard.ge1g.repl.co').content)
But it either gives an error about loads using bytes object or when I remove 'loads' , it returns the whole html code while I only want the code in the tag.
Could anyone help me?
CodePudding user response:
BeautifulSoup (bs4) is a great module to work with HTML data.
# Import required modules
from bs4 import BeautifulSoup as bs4
import json
import requests
# Retrieve page content
html = requests.get("your url").content
# Create BS4 object to handle HTML data
soup = bs4(html, "lxml")
# Extract text from body tag and remove \n, \s and \t
body = soup.find("body").text.strip()
# Create dictionary from extracted data
data = json.loads(body)