Python webscraping xpath that contains spaces and enter-CodePudding

I am trying to get href of a class in Beautifulsoup. The path i am looking for, has a class that contains spaces and a new line. I want to be able to extract "https://www.website.com/"

<a  target="_self" href="https://www.website.com/">

CodePudding user response：

means that the tag has class this, is, part1 and part2. You can use CSS selector .this.is.part1.part2 to select it:

from bs4 import BeautifulSoup

soup = BeautifulSoup(
    """<a  target="_self" href="https://www.website.com/">""",
    "html.parser",
)

url = soup.select_one("a.this.is.part1.part2")["href"]
print(url)

Prints:

https://www.website.com/

CodePudding user response：

Well, you can just use dictionary instead of __class.

from bs4 import BeautifulSoup

html = '<a  target="_self" href="https://www.website.com/">'
soup = BeautifulSoup(html, 'html.parser')

print(soup.find('a', {'class': 'this is part1 this is part2'}).get('href'))