I have a dataframe with the following columns (about 4000):
QA1_1, Q8_r1_c3_1, Q19b_5_1 , ... , QA1_32, Q8_r1_c3_32, Q19b_5_32
I have created two dictionaries, one with the name I would like to replace my variables with before the '_', and another with the associated endings from _1 to _32, example :
dict_1 = {'QA1' : 'electric',
'Q8_r1_c3' : 'solar',
...
'Q19b_5' : 'urban'}
dict_2 = {'_1' : 'Restaurants',
'_2' : 'Hotels',
...
'_32' : 'School'}
My question is: How do I rename my columns to be consistent with the general name of my variables but also with the associated suffixes
The desired end result :
electric_Restaurants , solar_Restaurants, urban_Restaurants , ... , electric_School , solar_School, urban_School
CodePudding user response:
Use str.replace
:
f = lambda m: f'{dict_1.get(m.group(1), m.group(1))}_{dict_2.get(m.group(2), m.group(2))}'
df.columns = df.columns.str.replace(r'(.*)_([^_] )$', f)
Output:
electric_1 solar_1 urban_1 electric_32 solar_32 urban_32
0
CodePudding user response:
With pandas.Series.str.rsplit
:
df.columns = [f"{dict_1.get(x[0], x[0])}_{dict_2.get(f'_{x[1]}', x[1])}"
for x in df.columns.str.rsplit("_", n=1)]
Output:
#BEFORE :
Index(['QA1_1', 'Q8_r1_c3_1', 'Q19b_5_1', 'QA1_32', 'Q8_r1_c3_32',
'Q19b_5_32'],
dtype='object')
#AFTER :
Index(['electric_Restaurants', 'solar_Restaurants', 'urban_Restaurants',
'electric_School', 'solar_School', 'urban_School'],
dtype='object')