Home > Software engineering >  Is there a way in Python/Pandas to use a generic variable name with a wildcard to select all similar
Is there a way in Python/Pandas to use a generic variable name with a wildcard to select all similar

Time:12-09

In Stata, if I typed Week_*, it would select all columns Week_1, Week_2, etc. Is there a similar way to do this in Python/Pandas?

Code example, including last line for what I want to do.

# One-hot Encode Week: Create variables Week_1, Week_2, ... etc.
dt_temp0 = dt_temp0.join(pd.get_dummies(dt_temp0['Week'],prefix='Week'))

# Features to Use
feat_cols = ['lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp',
             'ClinicID_fac', 'Week_*']

x_train = dt_temp1.loc[dt_temp1['train'] == 1,feat_cols]

CodePudding user response:

You could select your week columns with a list comprehension:

week_cols = [col for col in df_temp1.columns if col.startswith('Week_')]
feat_cols = ['lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp',
             'ClinicID_fac', *week_cols]

You can combine these into one line if you want.

CodePudding user response:

I actually found another way to do this, as well... using filter(). Then you just have to concatenate the string arrays together. Thanks for all the help!

week_cols = dt_temp0.filter(regex = "Week_" ).columns.tolist()
feat_cols = ['ClinicID_fac','lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp']   week_cols 
  • Related