Remove content inside span-CodePudding

It looks quite easy, but I haven't managed to find a solution. I tried other proposed solutions, like: span.clear() but didn't do it.

Web's structure:

<div class="details">           
  <h2>Public function</h2>  
  <div class="token">
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>

Result:

Time of Death: 13:38:00

Desired result:

13:38:00

My code:

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".

CodePudding user response：

I wouldn't really ever recommend traversing the DOM by repeatedly trying to get the next sibling - in my experience, every time you do this it makes your script more and more prone to breakages for the smallest changes in the source HTML.

Instead, find the parent  you're after by using a lambda function to filter based on the contents of the  itself (the 'Time of Death:' string, specifically); then loop through the child elements of that  element and remove the  to extract what you're after:

html = '''<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>
  </div>
</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
  span.decompose()

print(name_person)
print(time_decease.text)

^repl.it

CodePudding user response：

you can try this:

from bs4 import BeautifulSoup

soup = BeautifulSoup(
"""
<div >           
  <h2>Public function</h2>  
  <div >
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
      <span>NO</span>NO
    </p>
      <span title="Time of Death:">13:38:00</span> 
</div>

""", "xml")


print(soup.select_one("span[title*=Time]").text)