I am trying to remove a list of str from a column value like:
char_lst = ['1.', '1)', '2.', '2)', '3.', '3)'] # so on with the digit format
I tried:
import re
df['X'].apply(lambda x: re.sub('|'.join(replace_char), '', re.escape(x))).astype(str)
but it gives me error:
re.error: unbalanced parenthesis at position 4
CodePudding user response:
Use Series.str.replace
:
import re
df = pd.DataFrame({'X': ['2)A', 'B', 'C', 'A', 'D', 'E', 'F', 'D', 'H', 'I1.', 'J3)']})
char_lst = ['1.', '1)', '2.', '2)', '3.', '3)']
df['X'] = df['X'].str.replace("|".join(re.escape(x) for x in char_lst),'', regex=True)
print (df)
X
0 A
1 B
2 C
3 A
4 D
5 E
6 F
7 D
8 H
9 I
10 J
EDIT: If need remove numbers with .
or )
after digits use:
df['X'] = df['X'].str.replace("\d [\.\)]",'', regex=True)