Home > database >  How to extract HTML button and script content using python?
How to extract HTML button and script content using python?

Time:10-12

I'm trying to scraping and I want to get the button and script content using python, for example

<button  href=www.example.com link="www.link.com"></button>

I want to print the class, href and the quoted link from the button tag,

<script> let x = 10; let y = 20; let link = "www.link.com"; <\script>

I want to get x, y and the link from the script tag, anyone can help?

CodePudding user response:

Try:

import re
from bs4 import BeautifulSoup

html_doc = """\
<button  href=www.example.com link="www.link.com"></button>
<script>let x = 10; let y = 20; let link = "www.link.com";</script>"""

soup = BeautifulSoup(html_doc, "html.parser")

# print <button> stuff
button = soup.find("button", class_="xxx")
print(f"{button['class']=} {button['link']=} {button['href']=}")

# print <script> stuff
script = soup.find("script").text
x = re.search(r"let x = (\S );", script).group(1)
y = re.search(r"let y = (\S );", script).group(1)
link = re.search(r'let link = "(\S )"', script).group(1)
print(f"{x=} {y=} {link=}")

Prints:

button['class']=['xxx'] button['link']='www.link.com' button['href']='www.example.com'
x='10' y='20' link='www.link.com'
  • Related