Home > Software engineering >  pandas: removing multiple str from a str
pandas: removing multiple str from a str

Time:01-11

I am trying to remove a list of str from a column value like:

char_lst = ['1.', '1)', '2.', '2)', '3.', '3)']  # so on with the digit format

I tried:

import re
df['X'].apply(lambda x: re.sub('|'.join(replace_char), '', re.escape(x))).astype(str)

but it gives me error:

re.error: unbalanced parenthesis at position 4

CodePudding user response:

Use Series.str.replace:

import re

df = pd.DataFrame({'X': ['2)A', 'B', 'C', 'A', 'D', 'E', 'F', 'D', 'H', 'I1.', 'J3)']})

char_lst = ['1.', '1)', '2.', '2)', '3.', '3)']

df['X'] = df['X'].str.replace("|".join(re.escape(x) for x in char_lst),'', regex=True)
print  (df)
    X
0   A
1   B
2   C
3   A
4   D
5   E
6   F
7   D
8   H
9   I
10  J

EDIT: If need remove numbers with . or ) after digits use:

df['X'] = df['X'].str.replace("\d [\.\)]",'', regex=True)
  • Related