Home > Enterprise >  Entering df.groupby....max() result into a new column. Pandas
Entering df.groupby....max() result into a new column. Pandas

Time:11-14

My data is related to "Cricket", sports game (like Baseball). It has 20 overs for each inning max and each over has approx 6 balls.

data:

        season  match_id    inning  sum_total_runs  sum_total_wickets   over/ball   innings_score
32      2008    60          1       61              0                   5.1         0
33      2008    60          1       61              1                   5.2         0
34      2008    60          1       61              1                   5.3         0
35      2008    60          1       61              1                   5.4         0
36      2008    60          1       61              1                   5.5         0
...     ...     ...         ...     ...             ...                 ...         ...
179073  2019    11415       2       152             5                   19.2        0
179074  2019    11415       2       154             5                   19.3        0
179075  2019    11415       2       155             6                   19.4        0
179076  2019    11415       2       157             6                   19.5        0 
179077  2019    11415       2       157             7                   19.6        0

111972 rows × 7 columns

innings_score is new column created by me (given default value 0). I want to update it. The values that I want to enter in it are the results of df.groupby below.

In[]:
df.groupby(['season', 'match_id', 'inning'])['sum_total_runs'].max()

Out[]:
season  match_id  inning
2008    60        1         222
                  2          82
        61        1         240
                  2         207
        62        1         129
                           ... 
2019    11413     2         170
        11414     1         155
                  2         162
        11415     1         152
                  2         157
Name: sum_total_runs, Length: 1276, dtype: int64

I want innings_score to be like:

        season  match_id    inning  sum_total_runs  sum_total_wickets   over/ball   innings_score
32      2008    60          1       61              0                   5.1         222
33      2008    60          1       61              1                   5.2         222
34      2008    60          1       61              1                   5.3         222
35      2008    60          1       61              1                   5.4         222
36      2008    60          1       61              1                   5.5         222
...     ...     ...         ...     ...             ...                 ...         ...
179073  2019    11415       2       152             5                   19.2        157
179074  2019    11415       2       154             5                   19.3        157
179075  2019    11415       2       155             6                   19.4        157
179076  2019    11415       2       157             6                   19.5        157
179077  2019    11415       2       157             7                   19.6        157

111972 rows × 7 columns

CodePudding user response:

I would use assign. Starting from a simple example:

import pandas as pd

dt = pd.DataFrame({"name1":["A", "A", "B", "B", "C", "C"], "name2":["C", "C", "C", "D", "D", "D"], "value":[1, 2, 3, 4, 5, 6]})
grouping_variables = ["name1", "name2"]
dt = dt.set_index(grouping_variables)
dt = dt.assign(new_column=dt.groupby(grouping_variables)["value"].max())

As you can see, you set your grouping_variables as indeces before running the assignment.

You can always reset the index at the end if you don't want to keep the grouping_variables indexed dataframe:

dt.reset_index()

CodePudding user response:

One way is to set those 3 columns as index and assign the groupby result as a new column and reset index after that.

While those columns are index, the grouby result and the dataframe both have similar index, so pandas will automatically match and insert the correct rows in the correct positions. Then reset index will turn them back into normal columns.

Something like this:

In [46]: df
Out[46]:
   season  match_id  inning  sum_total_runs  sum_total_wickets  over/ball
0    2008        60       1              61                  0        5.1
1    2008        60       1              61                  1        5.2
2    2008        60       1              61                  1        5.3
3    2008        60       1              61                  1        5.4
4    2008        60       1              61                  1        5.5
5    2019     11415       2             152                  5       19.2
6    2019     11415       2             154                  5       19.3
7    2019     11415       2             155                  6       19.4
8    2019     11415       2             157                  6       19.5
9    2019     11415       2             157                  7       19.6

In [47]: df.set_index(['season', 'match_id', 'inning']).assign(innings_score=df.groupby(['season', 'match_id', 'inning'])['sum_total_runs'].max()).reset_index()
Out[47]:
   season  match_id  inning  sum_total_runs  sum_total_wickets  over/ball  innings_score
0    2008        60       1              61                  0        5.1             61
1    2008        60       1              61                  1        5.2             61
2    2008        60       1              61                  1        5.3             61
3    2008        60       1              61                  1        5.4             61
4    2008        60       1              61                  1        5.5             61
5    2019     11415       2             152                  5       19.2            157
6    2019     11415       2             154                  5       19.3            157
7    2019     11415       2             155                  6       19.4            157
8    2019     11415       2             157                  6       19.5            157
9    2019     11415       2             157                  7       19.6            157

  • Related