Home > Mobile >  How does Python Pandas Transform work internally when passed a lambda question?
How does Python Pandas Transform work internally when passed a lambda question?

Time:11-16

I found the following example online which explains how to essentially achieve a SQL equivalent of PARTITION BY

df['percent_of_points'] = df.groupby('team')['points'].transform(lambda x: x/x.sum())

#view updated DataFrame
print(df)

  team  points  percent_of_points
0    A      30           0.352941
1    A      22           0.258824
2    A      19           0.223529
3    A      14           0.164706
4    B      14           0.191781
5    B      11           0.150685
6    B      20           0.273973
7    B      28           0.383562

I struggle to understand what the 'x' refers to in the lambda function lambda x: x/x.sum() because it appears to refer to an individual element when used as the numerator i.e. 'x' but also appears to be a list of values when used as a denominator i.e. x.sum().

I think I am not thinking about this is in the right way or have a gap in my understanding of python or pandas.

CodePudding user response:

it appears to refer to an individual element when used as the numerator i.e. 'x' but also appears to be a list of values when used as a denominator i.e. x.sum()

It doesn't, it returns a pd.Series of length the size of the group, and x / x.sum() is not a single value, it a pd.Series of the same size.

.transform will assign the values of this series to the corresponding values in that column from the group-by operation.

So, consider:

In [15]: df
Out[15]:
  team  points
0    A      30
1    A      22
2    A      19
3    A      14
4    B      14
5    B      11
6    B      20
7    B      28

In [16]: for k, g in df.groupby('team')['points']:
    ...:     print(g)
    ...:     print(g / g.sum())
    ...:
0    30
1    22
2    19
3    14
Name: points, dtype: int64
0    0.352941
1    0.258824
2    0.223529
3    0.164706
Name: points, dtype: float64
4    14
5    11
6    20
7    28
Name: points, dtype: int64
4    0.191781
5    0.150685
6    0.273973
7    0.383562
Name: points, dtype: float64

In [17]:
  • Related