Home > Enterprise >  How to remove only certain characters with a pre condition given
How to remove only certain characters with a pre condition given

Time:10-23

I'm trying to remove specific characters from a list of strings using Python.

My strings are like these:

<p><a href="first/Fruit-Shop-One.html">Fruit-Shop-One</a></p>
<p><a href="first/Fruit-Shop-Two.html">Fruit-Shop-Two</a></p>

What I'm trying to get is to remove the '-' without breaking the link. So the final result must be like this:

<p><a href="first/Fruit-Shop-One.html">Fruit Shop One</a></p>
<p><a href="first/Fruit-Shop-Two.html">Fruit Shop Two</a></p>

CodePudding user response:

Here is a quick and dirty way to do this by splitting the string and joining them together later.

strings = ['<p><a href="first/Fruit-Shop-One.html">Fruit-Shop-One</a></p>', '<p><a href="first/Fruit-Shop-Two.html">Fruit-Shop-Two</a></p>']
for string in strings:
    new_string = string.split('">')[0]   '">'   string.split('">')[1].replace("-", " ")

Output:

<p><a href="first/Fruit-Shop-One.html">Fruit Shop One</a></p>
<p><a href="first/Fruit-Shop-Two.html">Fruit Shop Two</a></p>

Or in a list comprehension

new_strings = [string.split('">')[0]   '">'   string.split('">')[1].replace("-", " ") for string in strings]

Output:

['<p><a href="first/Fruit-Shop-One.html">Fruit Shop One</a></p>', '<p><a href="first/Fruit-Shop-Two.html">Fruit Shop Two</a></p>']

CodePudding user response:

from bs4 import BeautifulSoup

string_one = '<p><a href="first/Fruit-Shop-One.html">Fruit-Shop-One</a></p>'

soup = BeautifulSoup(string_one, "html.parser")

for a in soup.findAll('a'):
    a.string = a.string.replace('-', ' ')


new_string = str(soup)

print(soup)
# <p><a href="first/Fruit-Shop-One.html">Fruit Shop One</a></p>
  • Related