Python delete character in a string-CodePudding

I'm stuck, I've searched several ways and I can't get the correct output.

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
ns =''.join([i for i in string if i.isalpha()])
print(ns)

HelloIhaveaBigproblemisnotagoodnumber

I want this output:

Hello I have a Big problem is not a good number

Can you help me? Thank you!!

CodePudding user response：

You could increase the conditions used for each character, e.g.,

ns =''.join([i for i in string if i == " " or i.isalpha()])

But there is a problem with sequences like "666 " that leave an extra space in the text.

Instead, you could use a regex to break the string down into a list of words and intervening non-word text. Filter out the stuff you don't want, and then remove any items where the word itself went to zero size.

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
tmp = []

for word, other in re.findall(r"(\w )([^\w]*)", string):
    # strip non-alpha
    word = "".join(c for c in word if c.isalpha())
    # preserve only spaces
    other = "".join(c for c in other if c == " ")
    # only add if word still exists
    if word:
        tmp.append(word   other)
ns = "".join(tmp)
print(ns)

Output

Hello I have a Big problem is not a good number

CodePudding user response：

Filter uisng re & remove them using re.sub

import re
string = "Hello! I have a Big!!! problem 666 is not a good number__$"
print (re.sub('[^a-zA-Z] ', ' ', string))

output #

Hello I have a Big problem is not a good number

CodePudding user response：

You can use:

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
regex = re.compile('[^a-zA-Z ]')
print(regex.sub('', string))

And it will print:

Hello I have a Big problem is not a good number

(With double space between "problem" and "is")

If you want to remove the double space you write it like this:

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
regex = re.compile('[^a-zA-Z ]')
string = regex.sub('', string)
print(string.replace('  ', ' '))

And now the output will be:

Hello I have a Big problem is not a good number

CodePudding user response：

Use regex to replace all non alpha characters.

import re

input = "Hello! I have a Big!!! problem 666 is not a good number__$"

print(re.sub('[\W\d_] ', ' ', input))

Ouput:

Hello I have a Big problem is not a good number

CodePudding user response：

A lot of simple problems can be solved without the re library.

For this case, you can filter all characters that are not in the alphabet, or empty spaces:

from string import ascii_lowercase
ns = ''.join(filter(lambda c: c.lower() in ascii_lowercase ' ', s))
while '  ' in ns: ns = ns.replace('  ',' ')


# output:
# 'Hello I have a Big problem is not a good number'

Repeated spaces are filtered in the one-liner while-loop found above.

To work with non-english characters, you can replace ascii_lowercase with the desired choice of characters.