I have a dataframe such as :
COL1 COL2
Element1_VAL1 A
Element2_VAL2 B
Something_lima3 C
Something_logit5 D
and list such as:
the_list=['_VAL1','_VAL2','_lima3']
And I would like to remove from COL1,
all matching patterns within the_list
and get:
COL1 COL2
Element1 A
Element2 B
Something C
Something_logit5 D
Here is the dataframe in dict format :
{'COL1': {0: 'Element1_VAL1', 1: 'Element2_VAL2', 2: 'Something_lima3', 3: 'Something_logit5'}, 'COL2 ': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}}
CodePudding user response:
Try with replace()
, but modified slightly:
df['new'] = df['COL1'].str.replace('|'.join(the_list), '',regex=True)
print(df)
COL1 COL2 new
0 Element1_VAL1 A Element1
1 Element2_VAL2 B Element2
2 Something_lima3 C Something
3 Something_logit5 D Something_logit5
This '|'.join(the_list)
will join
all the different elements in your list
with |
, which str.replace
accepts and reads as or
. So if any of those substrings are spotted, it will replace them ''
.
CodePudding user response:
You can use pandas replace()
which is very helpful because it allows you to pass a list of elements to be replaced with a single element (blank for this case) and avoid multiple calls of .str.replace()
. Try:
df['COL1'] = df['COL1'].replace(the_list,'')