I have pandas DataFrame that I am trying to aggregate and store the select columns as dicts
.
import pandas as pd
df = pd.DataFrame({
'id': [1, 1, 2],
'lat': [37.7825, 37.7825, 37.7836],
'lon': [-122.4148, -122.4148, -122.4127],
'v': [10, 20, 10],
'b': [1, 2, 1],
'r': [1000, 1300, 1100],
's': [650, 720, 600]
})
I'd like to aggregate the DataFrame such that every unique combination of b
, r
and s
are stored as collections, a list of dicts
in one column and v
is the mean for each group.
Account for any edge cases such as NaNs
in b
, r
or s
. If NaNs
, then do not store them as dicts.
Expected output:
id lat lon v new
1 37.7825 -122.4148 15 [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': 1300, 's': 720}]
2 37.7836 -122.4127 10 [{'b': 1, 'r': 1100, 's': 600}]
CodePudding user response:
You can try convert the b
, r
, s
column to dictionary first then aggregate.
df['new'] = df[['b', 'r', 's']].to_dict(orient='records')
out = df.groupby(['id', 'lat', 'lon'], as_index=False).agg({'v': 'mean', 'new': list})
print(out)
id lat lon v \
0 1 37.7825 -122.4148 15.0
1 2 37.7836 -122.4127 10.0
new
0 [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': 1300, 's': 720}]
1 [{'b': 1, 'r': 1100, 's': 600}]
CodePudding user response:
You can use to_dict
to convert to dictionary, and groupby agg
to combine.
Your example does not show any NaN or duplicated combination, but if this was the case, you could use dropna
/drop_duplicates
on the b/r/s columns first.
(df
.assign(new=pd.Series(df[['b','r','s']].to_dict('index')))
.groupby('id')
.agg({'lat': 'first',
'lon': 'first',
'v': 'mean',
'new': list
})
)
Output:
lat lon v new
id
1 37.7825 -122.4148 15 [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': ...
2 37.7836 -122.4127 10 [{'b': 1, 'r': 1100, 's': 600}]