Home > Software engineering >  Summing each play of a player in a pandas dataframe
Summing each play of a player in a pandas dataframe

Time:10-20

I have a pandas df as below where the scores of two players are tabulated. I want to calculate the sum of each game of each player where each game is scored consecutively. For example the first game played by A has a total score of 12, the second game played by A has a total score of 10, the first game played by B has a total score of 4 etc. How can I do this pandas way (vectorised or groupby etc) please?

df_players.groupby("Player").sum("Score") does only give overall total score and not for each game individually.

Many thanks.

Player Score
A 10
A 2
B 1
B 3
A 3
A 7
B 2

CodePudding user response:

You don't have Game in your DataFrame ... I assume the first two scores in your table are for Player A in Game #1 but I'm just guessing that since you said you expected the result to be 12. There is no way to figure this out from the data you provided. Add a column for Game to the DataFrame and then group by player and game ... the by= parameter of groupby() can take a list of columns to group by.

CodePudding user response:

I think you want cumsum() for cumulative_sum.

df_players.groupby('Player').cumsum()

You can add this as a new column to the existing frame like:

df_players['Running Score'] = df_players.groupby('Player').cumsum()

CodePudding user response:

This is an island-and-gap problem. Whenever the Player changes, a new island is formed.

# Build the islands. If the current row's Player is not equal to the previous
# row's Player, create a new island. We don't care about the island number, only
# that rows with consecutive Player are assigned to the same island.
island = df["Player"].ne(df["Player"].shift()).cumsum().rename("Island")

result = (
    # One Player   one Island = one Game
    df.groupby([df["Player"], island]).sum()
    # Now, for each Player, label the Games consecutively 
    .assign(Game=lambda x: x.groupby("Player").cumcount()   1)
    .set_index("Game", append=True)
)

Result:

                    Score
Player Island Game       
A      1      1        12
       3      2        10
B      2      1         4
       4      2         2
  • Related