I have a dataframe df1
:-
City | Territory | Region | Area | Target |
---|---|---|---|---|
Chicopee | Springfield MA | Northeast | National | 58761 |
Feeding Hills | Springfield MA | Northeast | National | 65204 |
Feeding Hills | Springfield MA | Northeast | National | 79862 |
Feeding Hills | Springfield MA | Northeast | National | 67247 |
Holyoke | Springfield MA | Northeast | East | 64347 |
Holyoke | Worcester MA | Northeast | East | 73473 |
Imputing with average value of target at that level using this codeI have this :
for col in columns: #columns=['City', 'Territory`,'Region','Area']
avg_tar= df.groupby(col).agg(**{'avg_tar_by_' col: ('Target', np.mean)})
df = df.merge(avg_tar, on=col)
df = df.drop(columns=columns)
df = df.rename(columns={'avg_tar_by_' col: col for col in columns})
City | Territory | Region | Area | Target |
---|---|---|---|---|
58761 | 67084.2 | 68149 | 67768.5 | 58761 |
70771 | 67084.2 | 68149 | 67768.5 | 65204 |
70771 | 67084.2 | 68149 | 67768.5 | 79862 |
70771 | 67084.2 | 68149 | 67768.5 | 67247 |
68910 | 67084.2 | 68149 | 68910 | 64347 |
68910 | 73473 | 68149 | 68910 | 73473 |
I have another dataframe df2
;I want to map the categories across all columns for df2
with the mapping values obtained in df1
:-
City | Territory | Region | Area | Target |
---|---|---|---|---|
Chicopee | Springfield MA | Northeast | National | 58761 |
Chicopee | Springfield MA | Northeast | East | 65204 |
Feeding Hills | Springfield MA | Northeast | East | 79862 |
Feeding Hills | Worcester MA | Northeast | East | 67247 |
Feeding Hills | Worcester MA | Northeast | East | 64347 |
Holyoke | Worcester MA | Northeast | East | 73473 |
Expected Output :
City | Territory | Region | Area | Target |
---|---|---|---|---|
58761 | 67084.2 | 68149 | 67768.5 | 58761 |
58761 | 67084.2 | 68149 | 67768.5 | 65204 |
70771 | 67084.2 | 68149 | 67768.5 | 79862 |
70771 | 73473 | 68149 | 68910 | 67247 |
70771 | 73473 | 68149 | 68910 | 64347 |
68910 | 73473 | 68149 | 68910 | 73473 |
CodePudding user response:
IIUC, you are trying to map the mean Target
values for each group of each column in df1
to df2
- which you can achieve by extracting the mean Target
values into a dictionary -
# Creating the mapping dict
grouping_cols = [col for col in df1.columns if col != 'Target']
g_map = {g: dict(df1.groupby(g)['Target'].agg('mean')) for g in grouping_cols}
# Mapping from the dict
df3 = pd.DataFrame()
for g in g_map:
df3[g] = df2[g].map(g_map[g])
df3['Target'] = df2['Target']
Output (df3)
City Territory Region Area Target
0 58761 67084.2 68149 67768.5 58761
1 58761 67084.2 68149 68910.0 65204
2 70771 67084.2 68149 68910.0 79862
3 70771 73473.0 68149 68910.0 67247
4 70771 73473.0 68149 68910.0 64347
5 68910 73473.0 68149 68910.0 73473