Home > Software engineering >  How to achieve this transformation in Pandas?
How to achieve this transformation in Pandas?

Time:07-25

I have a dataframe df1 :-

City Territory Region Area Target
Chicopee Springfield MA Northeast National 58761
Feeding Hills Springfield MA Northeast National 65204
Feeding Hills Springfield MA Northeast National 79862
Feeding Hills Springfield MA Northeast National 67247
Holyoke Springfield MA Northeast East 64347
Holyoke Worcester MA Northeast East 73473

Imputing with average value of target at that level using this codeI have this :

for col in columns: #columns=['City', 'Territory`,'Region','Area']
    avg_tar= df.groupby(col).agg(**{'avg_tar_by_' col: ('Target', np.mean)})
    df = df.merge(avg_tar, on=col)
df = df.drop(columns=columns)
df = df.rename(columns={'avg_tar_by_' col: col for col in columns})

City Territory Region Area Target
58761 67084.2 68149 67768.5 58761
70771 67084.2 68149 67768.5 65204
70771 67084.2 68149 67768.5 79862
70771 67084.2 68149 67768.5 67247
68910 67084.2 68149 68910 64347
68910 73473 68149 68910 73473

I have another dataframe df2 ;I want to map the categories across all columns for df2 with the mapping values obtained in df1 :-

City Territory Region Area Target
Chicopee Springfield MA Northeast National 58761
Chicopee Springfield MA Northeast East 65204
Feeding Hills Springfield MA Northeast East 79862
Feeding Hills Worcester MA Northeast East 67247
Feeding Hills Worcester MA Northeast East 64347
Holyoke Worcester MA Northeast East 73473

Expected Output :

City Territory Region Area Target
58761 67084.2 68149 67768.5 58761
58761 67084.2 68149 67768.5 65204
70771 67084.2 68149 67768.5 79862
70771 73473 68149 68910 67247
70771 73473 68149 68910 64347
68910 73473 68149 68910 73473

CodePudding user response:

IIUC, you are trying to map the mean Target values for each group of each column in df1 to df2 - which you can achieve by extracting the mean Target values into a dictionary -

# Creating the mapping dict
grouping_cols = [col for col in df1.columns if col != 'Target']
g_map = {g: dict(df1.groupby(g)['Target'].agg('mean')) for g in grouping_cols}

# Mapping from the dict
df3 = pd.DataFrame()
for g in g_map:
    df3[g] = df2[g].map(g_map[g])
df3['Target'] = df2['Target']

Output (df3)

    City  Territory  Region     Area  Target
0  58761    67084.2   68149  67768.5   58761
1  58761    67084.2   68149  68910.0   65204
2  70771    67084.2   68149  68910.0   79862
3  70771    73473.0   68149  68910.0   67247
4  70771    73473.0   68149  68910.0   64347
5  68910    73473.0   68149  68910.0   73473
  • Related