Home > OS >  Pandas groupby apply list - Ensure it preserves order
Pandas groupby apply list - Ensure it preserves order

Time:04-02

I'm kinda new to pandas and ran into the following line of code

df.groupby(by=['id']).agg(lambda x: list(x))

here df is

   id var_x var_y
0   1   xe  ye
1   1   xb  yb
2   1   xc  yc
3   2   xd  yd
4   3   xe  ye
5   1   xa  ya
6   2   xf  yf

It gives the (expected) result

       var_x                 var_y
id      
1   [xe, xb, xc, xa]    [ye, yb, yc, ya]
2           [xd, xf]            [yd, yf]
3               [xe]                [ye]

The question is can we ensure that the agregates for each variable share the same order? E.g. with id=1, is there any way to explain that we won't have [xe, xb, xc, xa] and [ya, ye, yc, yb] instead of [xe, xb, xc, xa] and [ye, yb, yc, ya]?

CodePudding user response:

short answer

Yes, the order is ensured.

documentation

The documentation of groupby indicates that you can sort the groups, but that in any case the order of the rows is preserved.

sortbool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

details

groupby.agg passes entire Series to the aggregation function, group by group, then the same for the next columns.

You can check this by running print:

df.groupby(by=['id']).agg(print)

0    xe
1    xb
2    xc
5    xa
Name: var_x, dtype: object
3    xd
6    xf
Name: var_x, dtype: object
4    xe
Name: var_x, dtype: object
0    ye
1    yb
2    yc
5    ya
Name: var_y, dtype: object
3    yd
6    yf
Name: var_y, dtype: object
4    ye
Name: var_y, dtype: object
order within the lists

This is equivalent to running list(Series) for each processed Series. list does not modify the order of the passed iterable, so the order is preserved.

  • Related