Home > database >  How can I remove punctuation from all the items in a list in a more pythonic way?
How can I remove punctuation from all the items in a list in a more pythonic way?

Time:01-03

I have a list of strings with lots of unnecessary punctuation marks. how could I remove all possible punctuations and numbers from these strings in a more pythonic way?

medicinal_chemicals = ['(RS)-2-(4-(2-methylpropyl)phenyl)propanoic acid', 'emtricitabine / tenofovir (Stribild®)', 'Interleukin-2 (Aldesleukin)']

CodePudding user response:

You can use the following chunk of code for what you requested.

If you do not want to remove the Unicode characters, just skip that function

import string

pattern = r'[0-9]'

def removePunct(s):
    return s.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).replace(' '*4, ' ').replace(' '*3, ' ').replace(' '*2, ' ').strip()

def removeDigits(s):
    return re.sub(pattern, ' ', s)

def removeSpaceTrailsString(s):
    return " ".join(s.split())

def clean_unicide(s):
    s_encode = s.encode("ascii", "ignore")
    return s_encode.decode()

medicinal_chemicals = ['(RS)-2-(4-(2-methylpropyl)phenyl)propanoic acid', 'emtricitabine / tenofovir (Stribild®)', 'Interleukin-2 (Aldesleukin)']


punctCleaned = list(map(removePunct, medicinal_chemicals))
digitsCleaned = list(map(removeDigits, punctCleaned))
spacesCleaned = list(map(removeSpaceTrailsString, digitsCleaned))
unicodeCleaned = list(map(clean_unicide, spacesCleaned))


print(unicodeCleaned)

['RS methylpropyl phenyl propanoic acid', 'emtricitabine tenofovir Stribild', 'Interleukin Aldesleukin']
  • Related