Home > Blockchain >  Python BeautifulSoup: Excluding other tags in in select statement
Python BeautifulSoup: Excluding other tags in in select statement

Time:06-04

I'm having trouble with selecting text with BeautifulSoup. I am trying to get text from <span class= "data"> only, but I keep getting results with other elements as well. An example of the words I want are 'Playstation 3' and 'Game Boy Advance' in the code below, not 'PC' Could you help?

soup:

<span >
                  PlayStation 3
                 </span>,
 <span >
                  Game Boy Advance
                 </span>,
 <span >
                  Dec 8, 2022
                 </span>,
 <span >
 <a href="/game/pc">
                   PC
                  </a>

P.S. I've tried this below code:

console = soup.select('span.data')
for console in console:
    print(console.get_text(strip = True))

output snippet:

PlayStation 3
Game Boy Advance
Dec 8, 2022
PC

Thanks!

CodePudding user response:

This example will select all <span > which don't have any other tags inside them:

from bs4 import BeautifulSoup

html_doc = """\
<span >
                  PlayStation 3
                 </span>,
 <span >
                  Game Boy Advance
                 </span>,
 <span >
                  Dec 8, 2022
                 </span>,
 <span >
 <a href="/game/pc">
                   PC
                  </a>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for span in soup.select("span.data:not(:has(*))"):
    print(span.get_text(strip=True))

Prints:

PlayStation 3
Game Boy Advance
Dec 8, 2022
  • Related