Pandas DataFrame GroupBy - Aggregate and store dicts-CodePudding

I have pandas DataFrame that I am trying to aggregate and store the select columns as dicts.

import pandas as pd

df = pd.DataFrame({
                   'id': [1, 1, 2],
                   'lat': [37.7825, 37.7825, 37.7836],
                   'lon': [-122.4148, -122.4148, -122.4127],
                   'v': [10, 20, 10],
                   'b': [1, 2, 1],
                   'r': [1000, 1300, 1100],
                   's': [650, 720, 600]
                 })

I'd like to aggregate the DataFrame such that every unique combination of b, r and s are stored as collections, a list of dicts in one column and v is the mean for each group.

Account for any edge cases such as NaNs in b, r or s. If NaNs, then do not store them as dicts.

Expected output:

id lat      lon        v   new

1  37.7825  -122.4148  15  [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': 1300, 's': 720}]
2  37.7836  -122.4127  10  [{'b': 1, 'r': 1100, 's': 600}]

CodePudding user response：

You can try convert the b, r, s column to dictionary first then aggregate.

df['new'] = df[['b', 'r', 's']].to_dict(orient='records')
out = df.groupby(['id', 'lat', 'lon'], as_index=False).agg({'v': 'mean', 'new': list})

print(out)

   id      lat       lon     v  \
0   1  37.7825 -122.4148  15.0
1   2  37.7836 -122.4127  10.0

                                                                new
0  [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': 1300, 's': 720}]
1                                 [{'b': 1, 'r': 1100, 's': 600}]

CodePudding user response：

You can use to_dict to convert to dictionary, and groupby agg to combine.

Your example does not show any NaN or duplicated combination, but if this was the case, you could use dropna/drop_duplicates on the b/r/s columns first.

(df
 .assign(new=pd.Series(df[['b','r','s']].to_dict('index')))
 .groupby('id')
 .agg({'lat': 'first',
       'lon': 'first',
       'v': 'mean',
       'new': list
        })
 )

Output:

        lat       lon   v                                                new
id                                                                          
1   37.7825 -122.4148  15  [{'b': 1, 'r': 1000, 's': 650}, {'b': 2, 'r': ...
2   37.7836 -122.4127  10                    [{'b': 1, 'r': 1100, 's': 600}]