I have a series of columns:
COLUMNS = contract_number, award_date_x, publication_date_x, award_date_y, publication_date_y, award_date, publication_date
I would like to drop all of the 'publication_date' columns that end with '_[a z'], so that my final result would look like this:
COLUMNS = contract_number, award_date, award_date_x, award_date_y, publication_date
I have tried the following with no luck:
df_merge=df_merge.drop(c for c in df_merge.columns if c.str.contains('publication_date_[a z]$'))
Thanks
CodePudding user response:
Try this,
import re
columns = df_merge.columns.tolist() # getting all columns
for col in columns:
if re.match(r"publication_date_[a-z]$",col): #regex for your match case
df_merge.drop([col], axis=1, inplace=True) # If regex matches, then remove the column
df_merge.head() # Filtered dataframe
CodePudding user response:
lis = ["publication_date_x", "publication_date", "publication_date_x_y_y", "hello"]
new_list = [x for x in lis if not x.startswith('publication_date_')]
the output will be
new_list: ["publication_date", "hello"]
CodePudding user response:
If you want to use str.contains
you'll need to make the list of columns a Series.
series_cols = pd.Series(df_merge.columns)
bool_series_cols = series_cols.str.contains('publication_date_[a-z]$')
df_merge.drop([c for c, bool_c in zip(series_cols, bool_series_cols) if bool_c], axis=1, inplace=True)