I am trying to get href of a class in Beautifulsoup. The path i am looking for, has a class that contains spaces and a new line. I want to be able to extract "https://www.website.com/"
<a target="_self" href="https://www.website.com/">
CodePudding user response:
means that the tag has class
this
, is
, part1
and part2
. You can use CSS selector .this.is.part1.part2
to select it:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
"""<a target="_self" href="https://www.website.com/">""",
"html.parser",
)
url = soup.select_one("a.this.is.part1.part2")["href"]
print(url)
Prints:
https://www.website.com/
CodePudding user response:
Well, you can just use dictionary
instead of __class
.
from bs4 import BeautifulSoup
html = '<a target="_self" href="https://www.website.com/">'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('a', {'class': 'this is part1 this is part2'}).get('href'))