How can I delete characters from Strings inside a list?-CodePudding

I have a list of many (18618) strings. I separated its words by implementing Tokenizing with NLTK library. But now each word - as in below- has an extra apostrophe and in some places â. How can I delete all of these?

I tried to delete them by implementing for loop but was unable to do it. What else can I do to solve this problem?

["'heart", "'darkness", "'nellie", "'cruising", "'yawl", ....................................]

CodePudding user response：

lst = ["'heart", "'darkness", "'nellie", "'cruising", "'yawl"]

# for every txt in lst remove the ' and â
new_lst = [txt.replace("'", "").replace("â", "") for txt in lst]

CodePudding user response：

To prevent this you may try decoding:

Caution: when tokenizing a Unicode string, make sure you are not using an encoded version of the string (it may be necessary to decode it first, e.g. with s.decode("utf8").