I have a large csv files which have several columns as follows:
M_15_19_yr_ | M_19_25_yr_ | M_25_35_yr_ |
---|---|---|
20 | 34 | 12 |
09 | 21 | 19 |
I want to remove such columns which start from M_{age1}_{age2}_yr. I tried using:
df = df.loc[:, ~df.columns.str.startswith(('M_15_19_yr_','M_19_25_yr_','M_25_35_yr_'))
However, I have many such columns. How do I remove all of such columns without explicitly writing down each column's name?
CodePudding user response:
You may check with filter
df = df.filter(regex = r'^(?!M_\d _\d _yr)')
CodePudding user response:
You may instead use str.contains
along with a regex pattern:
df = df.loc[:, ~df.columns.str.contains(r'^M_\d _\d _yr$', regex=True))
A more general pattern which includes the new case given in your comment below would be:
df = df.loc[:, ~df.columns.str.contains(r'^\w _(?:\w _)*\d _\d _yr$', regex=True))