Home > Back-end >  Clean Beautiful Soup output from unwanted span
Clean Beautiful Soup output from unwanted span

Time:10-23

I am reading HTML using Beautiful Soup. I have ran the command soup.find_all("span",{"class":"budget-list__data__number budget-list__number show-for-medium"}) and obtain:

[<span >
      4 000 €

      <span >24 <span >votes</span></span>
</span>, <span >
      25 000 €

      <span >24 <span >votes</span></span>
</span>, <span >
      14 000 €

      <span >23 <span >votes</span></span>
</span>, <span >
      35 000 €
      
     .
     .
     .

I am interested in keeping only the elements that include monetary amounts (e.g: 4 000 euros, etc) but ignoring the bits of code included in <span >. I thought about using span.clear() but that does not do the trick. Do you have any suggestions?

CodePudding user response:

Try:

spans = soup.find_all(
    "span",
    {"class": "budget-list__data__number budget-list__number show-for-medium"},
)

for span in spans:
    print(span.contents[0].strip())

Prints:

4 000 €
25 000 €
14 000 €
35 000 €
  • Related