Home > OS >  How to remove tags and content after a certain text in beautifulsoup
How to remove tags and content after a certain text in beautifulsoup

Time:07-17

I am trying to remove all content after certain text but the problem is that the text is broken down by br tags so I can't just remove the siblings because there is text that I need to keep that have the same tag. haven't found a solution in a week...

<br/> text to keep
<br/> text to keep
<br/> target text
<br/> text to delete
<br/> target text 
<br/> text to delete
<br/> text to delete

CodePudding user response:

you can do something like below.

keep a flag wich tell you that you have matched the text or not. if not then add the word to solution list and if it is matched then set it to false, you can tell program to not add non matched word.

texts = ['text1', 'text2', 'text3', 'text4', 'text5','text4', 'text7','text4']
target_text = 'text4'
flag = True
solution = []

for text in texts:
    if text!=target_text and flag:
        solution.append(text)
    else:
         if text == target_text:
             solution.append(text)
             if flag:
                flag = False
print(solution)

output

['text1', 'text2', 'text3', 'text4', 'text4', 'text4']

CodePudding user response:

You can use:

soup.find(string="text_from_el_before_the_one_to_be_removed")

or

soup.find(string="text_from_el_to_be_removed")

And then remove the elements you want with extract(), or clear the content with clear(), or use decompose()

  • Related