I'm kinda new to pandas and ran into the following line of code
df.groupby(by=['id']).agg(lambda x: list(x))
here df
is
id var_x var_y
0 1 xe ye
1 1 xb yb
2 1 xc yc
3 2 xd yd
4 3 xe ye
5 1 xa ya
6 2 xf yf
It gives the (expected) result
var_x var_y
id
1 [xe, xb, xc, xa] [ye, yb, yc, ya]
2 [xd, xf] [yd, yf]
3 [xe] [ye]
The question is can we ensure that the agregates for each variable share the same order?
E.g. with id=1
, is there any way to explain that we won't have [xe, xb, xc, xa]
and [ya, ye, yc, yb]
instead of [xe, xb, xc, xa]
and [ye, yb, yc, ya]
?
CodePudding user response:
short answer
Yes, the order is ensured.
documentation
The documentation of groupby
indicates that you can sort the groups, but that in any case the order of the rows is preserved.
sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
details
groupby.agg
passes entire Series to the aggregation function, group by group, then the same for the next columns.
You can check this by running print
:
df.groupby(by=['id']).agg(print)
0 xe
1 xb
2 xc
5 xa
Name: var_x, dtype: object
3 xd
6 xf
Name: var_x, dtype: object
4 xe
Name: var_x, dtype: object
0 ye
1 yb
2 yc
5 ya
Name: var_y, dtype: object
3 yd
6 yf
Name: var_y, dtype: object
4 ye
Name: var_y, dtype: object
order within the lists
This is equivalent to running list(Series)
for each processed Series. list
does not modify the order of the passed iterable, so the order is preserved.