My DataFrame:
Col X Col Y ID Value
A a 'r' 3
A a 'b' 2
A a 'c' 1
B b 'd' 5
B b 's' 6
B b 'd' 7
Output required:
Col X Col Y Out
A a {'r':3, 'b':2, 'c':1}
B b {'d': 5, 's': 6, 'd':7}
Approach tried so far:
df = df.set_index(['Col X', 'Col Y', 'ID']).Value
dict_column = {k: df.xs((k, v)).to_dict() for k,v,v2 in df.index}
CodePudding user response:
Use GroupBy.apply
with lambda function:
df['ID'] = df['ID'].str.strip("'")
df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
.apply(lambda x: dict(x.to_numpy()))
.reset_index(name='Out'))
print (df1)
Col X Col Y Out
0 A a {'r': 3, 'b': 2, 'c': 1}
1 B b {'d': 7, 's': 6}
Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum
:
df['ID'] = df['ID'].str.strip("'")
df = df.groupby(['Col X', 'Col Y','ID'], as_index=False)['Value'].sum()
print (df)
Col X Col Y ID Value
0 A a b 2
1 A a c 1
2 A a r 3
3 B b d 12
4 B b s 6
df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
.apply(lambda x: dict(x.to_numpy()))
.reset_index(name='Out'))
print (df1)
Col X Col Y Out
0 A a {'b': 2, 'c': 1, 'r': 3}
1 B b {'d': 12, 's': 6}
CodePudding user response:
You can create pd.Series
inside apply
and use to_dict
:
output = ( df.groupby(['Col X', 'Col Y'])[['ID', 'Value']].
apply(lambda x: pd.Series(x['Value'].values,index=x['ID']).to_dict()) )
CodePudding user response:
You can use groupby.apply
with dict
and zip
:
(df.groupby(['Col X', 'Col Y'])
.apply(lambda x: dict(zip(x['ID'], x['Value'])))
.reset_index(name='Out')
)
Output:
Col X Col Y Out
0 A a {'r': 3, 'b': 2, 'c': 1}
1 B b {'d': 7, 's': 6}
If you want to aggregate the duplicated keys:
(df.groupby(['Col X', 'Col Y'])
.apply(lambda x: x['Value'].groupby(x['ID']).sum().to_dict())
.reset_index(name='Out')
)
Output:
Col X Col Y Out
0 A a {'b': 2, 'c': 1, 'r': 3}
1 B b {'d': 12, 's': 6}