How can I remove punctuation from all the items in a list in a more pythonic way?-CodePudding

I have a list of strings with lots of unnecessary punctuation marks. how could I remove all possible punctuations and numbers from these strings in a more pythonic way?

medicinal_chemicals = ['(RS)-2-(4-(2-methylpropyl)phenyl)propanoic acid', 'emtricitabine / tenofovir (Stribild®)', 'Interleukin-2 (Aldesleukin)']

CodePudding user response：

You can use the following chunk of code for what you requested.

If you do not want to remove the Unicode characters, just skip that function

import string

pattern = r'[0-9]'

def removePunct(s):
    return s.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).replace(' '*4, ' ').replace(' '*3, ' ').replace(' '*2, ' ').strip()

def removeDigits(s):
    return re.sub(pattern, ' ', s)

def removeSpaceTrailsString(s):
    return " ".join(s.split())

def clean_unicide(s):
    s_encode = s.encode("ascii", "ignore")
    return s_encode.decode()

medicinal_chemicals = ['(RS)-2-(4-(2-methylpropyl)phenyl)propanoic acid', 'emtricitabine / tenofovir (Stribild®)', 'Interleukin-2 (Aldesleukin)']


punctCleaned = list(map(removePunct, medicinal_chemicals))
digitsCleaned = list(map(removeDigits, punctCleaned))
spacesCleaned = list(map(removeSpaceTrailsString, digitsCleaned))
unicodeCleaned = list(map(clean_unicide, spacesCleaned))


print(unicodeCleaned)

['RS methylpropyl phenyl propanoic acid', 'emtricitabine tenofovir Stribild', 'Interleukin Aldesleukin']