Home > database >  How do you get the text inside only the <body> tag using requests.get()?
How do you get the text inside only the <body> tag using requests.get()?

Time:05-29

I have tried using :

from json import loads
from requests import get

text_inside_body_tag = loads(get('https://Clicker-leaderboard.ge1g.repl.co').content)

But it either gives an error about loads using bytes object or when I remove 'loads' , it returns the whole html code while I only want the code in the tag.

Could anyone help me?

CodePudding user response:

BeautifulSoup (bs4) is a great module to work with HTML data.

# Import required modules
from bs4 import BeautifulSoup as bs4
import json
import requests

# Retrieve page content
html = requests.get("your url").content
# Create BS4 object to handle HTML data
soup = bs4(html, "lxml")

# Extract text from body tag and remove \n, \s and \t
body = soup.find("body").text.strip()
# Create dictionary from extracted data
data = json.loads(body)
  • Related