I'm trying to scraping and I want to get the button and script content using python, for example
<button href=www.example.com link="www.link.com"></button>
I want to print the class, href and the quoted link from the button tag,
<script> let x = 10; let y = 20; let link = "www.link.com"; <\script>
I want to get x, y and the link from the script tag, anyone can help?
CodePudding user response:
Try:
import re
from bs4 import BeautifulSoup
html_doc = """\
<button href=www.example.com link="www.link.com"></button>
<script>let x = 10; let y = 20; let link = "www.link.com";</script>"""
soup = BeautifulSoup(html_doc, "html.parser")
# print <button> stuff
button = soup.find("button", class_="xxx")
print(f"{button['class']=} {button['link']=} {button['href']=}")
# print <script> stuff
script = soup.find("script").text
x = re.search(r"let x = (\S );", script).group(1)
y = re.search(r"let y = (\S );", script).group(1)
link = re.search(r'let link = "(\S )"', script).group(1)
print(f"{x=} {y=} {link=}")
Prints:
button['class']=['xxx'] button['link']='www.link.com' button['href']='www.example.com'
x='10' y='20' link='www.link.com'