I have a pandas df as below where the scores of two players are tabulated. I want to calculate the sum of each game of each player where each game is scored consecutively. For example the first game played by A has a total score of 12, the second game played by A has a total score of 10, the first game played by B has a total score of 4 etc. How can I do this pandas way (vectorised or groupby etc) please?
df_players.groupby("Player").sum("Score")
does only give overall total score and not for each game individually.
Many thanks.
Player | Score |
---|---|
A | 10 |
A | 2 |
B | 1 |
B | 3 |
A | 3 |
A | 7 |
B | 2 |
CodePudding user response:
You don't have Game in your DataFrame ... I assume the first two scores in your table are for Player A in Game #1 but I'm just guessing that since you said you expected the result to be 12. There is no way to figure this out from the data you provided. Add a column for Game to the DataFrame and then group by player and game ... the by= parameter of groupby() can take a list of columns to group by.
CodePudding user response:
I think you want cumsum()
for cumulative_sum
.
df_players.groupby('Player').cumsum()
You can add this as a new column to the existing frame like:
df_players['Running Score'] = df_players.groupby('Player').cumsum()
CodePudding user response:
This is an island-and-gap problem. Whenever the Player changes, a new island is formed.
# Build the islands. If the current row's Player is not equal to the previous
# row's Player, create a new island. We don't care about the island number, only
# that rows with consecutive Player are assigned to the same island.
island = df["Player"].ne(df["Player"].shift()).cumsum().rename("Island")
result = (
# One Player one Island = one Game
df.groupby([df["Player"], island]).sum()
# Now, for each Player, label the Games consecutively
.assign(Game=lambda x: x.groupby("Player").cumcount() 1)
.set_index("Game", append=True)
)
Result:
Score
Player Island Game
A 1 1 12
3 2 10
B 2 1 4
4 2 2