This may be hard to understand for someone not familiar with cricket!
I have a pandas dataframe that gives me the history of a cricket game, ball by ball. Below is a simplified and considerably shorter version. It has columns for the over being bowled, as well as the ball in that over. It also has columns for the batter facing that ball, the score they got on that ball and whether they got out or not.
Over | Ball | Batter | Score | Out | |
---|---|---|---|---|---|
0 | 1 | 1 | Ponting | 0 | 0 |
1 | 1 | 2 | Ponting | 1 | 0 |
2 | 1 | 3 | McCullum | 4 | 0 |
3 | 1 | 4 | McCullum | 2 | 0 |
4 | 1 | 5 | McCullum | 1 | 0 |
5 | 1 | 6 | Ponting | 0 | 1 |
6 | 2 | 1 | McCullum | 3 | 0 |
7 | 2 | 2 | Ganguly | 2 | 0 |
8 | 2 | 3 | Ganguly | 0 | 0 |
9 | 2 | 4 | Ganguly | 3 | 0 |
10 | 2 | 5 | McCullum | 1 | 0 |
11 | 2 | 6 | Ganguly | 0 | 0 |
What I want to do is have another couple of columns that show the cumulative score for each batter after that ball. It would look like this:
Over | Ball | Batter | Score | Out | Ponting | McCullum | Ganguly | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Ponting | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 2 | Ponting | 1 | 0 | 1 | 0 | 0 |
2 | 1 | 3 | McCullum | 4 | 0 | 1 | 4 | 0 |
3 | 1 | 4 | McCullum | 2 | 0 | 1 | 6 | 0 |
4 | 1 | 5 | McCullum | 1 | 0 | 1 | 7 | 0 |
5 | 1 | 6 | Ponting | 0 | 1 | 1 | 7 | 0 |
6 | 2 | 1 | McCullum | 3 | 0 | 1 | 10 | 0 |
7 | 2 | 2 | Ganguly | 2 | 0 | 1 | 10 | 2 |
8 | 2 | 3 | Ganguly | 0 | 0 | 1 | 10 | 2 |
9 | 2 | 4 | Ganguly | 3 | 0 | 1 | 10 | 5 |
10 | 2 | 5 | McCullum | 1 | 0 | 1 | 11 | 5 |
11 | 2 | 6 | Ganguly | 0 | 0 | 1 | 11 | 5 |
Obviously this could be achieved by just iterating through the rows and updating the score for each player after every ball, but this would not be the most efficient method. I have tried things like df.groupby(['Ball','Batter']).cumsum()
but this is not getting me what I want. My feeling is that it is probably going to some combination of a pivot
and groupby
.
CodePudding user response:
IIUC:
dummies = pd.get_dummies(df.Batter).eq(1)
sums = df.groupby(['Batter']).Score.cumsum()
dummies.mul(sums, axis=0).where(dummies).pad().fillna(0, downcast='infer')
Ganguly McCullum Ponting
0 0 0 0
1 0 0 1
2 0 4 1
3 0 6 1
4 0 7 1
5 0 7 1
6 0 10 1
7 2 10 1
8 2 10 1
9 5 10 1
10 5 11 1
11 5 11 1
And subsequently
df.join(
dummies.mul(sums, axis=0).where(dummies).pad().fillna(0, downcast='infer')
)
Over Ball Batter Score Out Ganguly McCullum Ponting
0 1 1 Ponting 0 0 0 0 0
1 1 2 Ponting 1 0 0 0 1
2 1 3 McCullum 4 0 0 4 1
3 1 4 McCullum 2 0 0 6 1
4 1 5 McCullum 1 0 0 7 1
5 1 6 Ponting 0 1 0 7 1
6 2 1 McCullum 3 0 0 10 1
7 2 2 Ganguly 2 0 2 10 1
8 2 3 Ganguly 0 0 2 10 1
9 2 4 Ganguly 3 0 5 10 1
10 2 5 McCullum 1 0 5 11 1
11 2 6 Ganguly 0 0 5 11 1