Home > Blockchain >  Remove columns which match some pattern in python
Remove columns which match some pattern in python

Time:08-25

I have a large csv files which have several columns as follows:

M_15_19_yr_ M_19_25_yr_ M_25_35_yr_
20 34 12
09 21 19

I want to remove such columns which start from M_{age1}_{age2}_yr. I tried using:

df = df.loc[:, ~df.columns.str.startswith(('M_15_19_yr_','M_19_25_yr_','M_25_35_yr_'))

However, I have many such columns. How do I remove all of such columns without explicitly writing down each column's name?

CodePudding user response:

You may check with filter

df = df.filter(regex = r'^(?!M_\d _\d _yr)')

CodePudding user response:

You may instead use str.contains along with a regex pattern:

df = df.loc[:, ~df.columns.str.contains(r'^M_\d _\d _yr$', regex=True))

A more general pattern which includes the new case given in your comment below would be:

df = df.loc[:, ~df.columns.str.contains(r'^\w _(?:\w _)*\d _\d _yr$', regex=True))
  • Related