Home > other >  Remove content inside span
Remove content inside span

Time:10-28

It looks quite easy, but I haven't managed to find a solution. I tried other proposed solutions, like: span.clear() but didn't do it.

Web's structure:

<div class="details">           
  <h2>Public function</h2>  
  <div class="token">
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>

Result:

Time of Death: 13:38:00

Desired result:

13:38:00

My code:

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".   
    

CodePudding user response:

I wouldn't really ever recommend traversing the DOM by repeatedly trying to get the next sibling - in my experience, every time you do this it makes your script more and more prone to breakages for the smallest changes in the source HTML.

Instead, find the parent <p></p> you're after by using a lambda function to filter based on the contents of the <p></p> itself (the 'Time of Death:' string, specifically); then loop through the child elements of that <p></p> element and remove the <span></span> to extract what you're after:

html = '''<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>
  </div>
</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
  span.decompose()

print(name_person)
print(time_decease.text)

repl.it

CodePudding user response:

you can try this:

from bs4 import BeautifulSoup

soup = BeautifulSoup(
"""
<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
      <span>NO</span>NO
    </p>
      <span title="Time of Death:">13:38:00</span> 
</div>

""", "xml")


print(soup.select_one("span[title*=Time]").text)
  • Related