I have the following dataframe called dropthese
.
| partname | x1 | x2 | x3....
0 text1_mid1
1 another1_mid2
2 yet_another
And another dataframe called df
that looks like this.
text1_mid1_suffix1 | text1_mid1_suffix2 | ... | something_else | another1_mid2_suffix1 | ....
0 .....
1 .....
2 .....
3 .....
I want to drop all the columns from df
, if a part of the name is in dropthese['partname']
.
So for example, since text1_mid1
is in partname
, all columns that contain that partial string should be dropped like text1_mid1_suffix1
and text1_mid1_suffix2
.
I have tried,
thisFilter = df.filter(dropthese.partname, regex=True)
df.drop(thisFilter, axis=1)
But I get this error, TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive
. What is the proper way to do this filter?
CodePudding user response:
I would use a regex with str.contains
(or str.match
if you want to restrict to the start of string):
import re
pattern = '|'.join(dropthese['partname'].map(re.escape))
out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]
Output:
something_else
0 ...
Why your command failed
you should pass the pattern to the regex
parameter of filter
, and use the column names in drop
:
pattern = '|'.join(dropthese['partname'].map(re.escape))
thisFilter = df.filter(regex=pattern)
df.drop(thisFilter.columns, axis=1)