Remove pattern within a column if present in a list in pandas-CodePudding

I have a dataframe such as :

COL1            COL2 
Element1_VAL1   A
Element2_VAL2   B
Something_lima3 C 
Something_logit5 D

and list such as:

the_list=['_VAL1','_VAL2','_lima3']

And I would like to remove from COL1, all matching patterns within the_list and get:

COL1             COL2 
Element1         A
Element2         B
Something        C 
Something_logit5 D

Here is the dataframe in dict format :

{'COL1': {0: 'Element1_VAL1', 1: 'Element2_VAL2', 2: 'Something_lima3', 3: 'Something_logit5'}, 'COL2 ': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}}

CodePudding user response：

Try with replace(), but modified slightly:

df['new'] = df['COL1'].str.replace('|'.join(the_list), '',regex=True)

print(df)

               COL1 COL2                new
0     Element1_VAL1     A          Element1
1     Element2_VAL2     B          Element2
2   Something_lima3     C         Something
3  Something_logit5     D  Something_logit5

This '|'.join(the_list) will join all the different elements in your list with |, which str.replace accepts and reads as or. So if any of those substrings are spotted, it will replace them ''.

CodePudding user response：

You can use pandas replace() which is very helpful because it allows you to pass a list of elements to be replaced with a single element (blank for this case) and avoid multiple calls of .str.replace(). Try:

df['COL1'] = df['COL1'].replace(the_list,'')