Home > Software design >  How to drop columns that end with a specific wildcard string?
How to drop columns that end with a specific wildcard string?

Time:03-10

I have a series of columns:

 COLUMNS = contract_number, award_date_x,   publication_date_x, award_date_y,   publication_date_y, award_date, publication_date    

I would like to drop all of the 'publication_date' columns that end with '_[a z'], so that my final result would look like this:

 COLUMNS = contract_number, award_date, award_date_x, award_date_y, publication_date

I have tried the following with no luck:

df_merge=df_merge.drop(c for c in df_merge.columns if c.str.contains('publication_date_[a z]$'))

Thanks

CodePudding user response:

Try this,

import re

columns = df_merge.columns.tolist() # getting all columns

for col in columns:
    if re.match(r"publication_date_[a-z]$",col): #regex for your match case
        df_merge.drop([col], axis=1, inplace=True) # If regex matches, then remove the column

df_merge.head() # Filtered dataframe

CodePudding user response:

lis = ["publication_date_x", "publication_date", "publication_date_x_y_y", "hello"]

new_list = [x for x in lis if not x.startswith('publication_date_')]

the output will be

new_list: ["publication_date", "hello"]

CodePudding user response:

If you want to use str.contains you'll need to make the list of columns a Series.

series_cols = pd.Series(df_merge.columns)
bool_series_cols = series_cols.str.contains('publication_date_[a-z]$')

df_merge.drop([c for c, bool_c in zip(series_cols, bool_series_cols) if bool_c], axis=1, inplace=True)
  • Related