I have the following df
consisting of two teams and their point differentials for games they have played.
df = pd.DataFrame({
'Team':['A','A','A','A','A','B','B','B','B','B'],
'Point_Diff':[-4,5,-1,2,2,6,-5,-4,3,-1]}
)
df
Team Point_Diff
0 A -4
1 A 5
2 A -1
3 A 2
4 A 2
5 B 6
6 B -5
7 B -4
8 B 3
9 B -1
I want to create a variable Losses
which is a cumulative count only where Point_Diff
is negative (since that means the team lost).
I have been able to do this, albeit kind of hacky and in multiple lines:
df['Losses'] = df.loc[df.Point_Diff<0].groupby(['Team'])[['Point_Diff']].transform('cumcount') 1
df.Losses = df.Losses.fillna(method='ffill')
df
Team Point_Diff Losses
0 A -4 1.0
1 A 5 1.0
2 A -1 2.0
3 A 2 2.0
4 A 2 2.0
5 B 6 2.0
6 B -5 1.0
7 B -4 2.0
8 B 3 2.0
9 B -1 3.0
There has to be a more elegant way to do this, likely with shift()
or something, but was unable to figure anything out in terms of incorporating it with a groupby()
. I would assume there has to be a one liner that can do this and would appreciate if anyone could provide input. Again, I have solved my initial task but now i just wish to clean up the code a bit.
EDIT: I'm also noticing my code doesn't even work. There is an error when team B starts in row 5. Losses
should be 0, not 2. Not sure why that is.
CodePudding user response:
Not sure if this is what you have in mind:
df.assign(Losses=df.Point_Diff.lt(0).groupby('Team').cumsum())
Team Point_Diff Losses
0 A -4 1
1 A 5 1
2 A -1 2
3 A 2 2
4 A 2 2
5 B 6 0
6 B -5 1
7 B -4 2
8 B 3 2
9 B -1 3
CodePudding user response:
It can be shortened:
df = df.assign(Losses=df[df.Point_Diff<0].groupby(['Team'])[['Point_Diff']].cumcount().add(1)).ffill()
Output:
>>> df
Team Point_Diff Losses
0 A -4 1.0
1 A 5 1.0
2 A -1 2.0
3 A 2 2.0
4 A 2 2.0
5 B 6 2.0 # <-- Still 2, though.
6 B -5 1.0
7 B -4 2.0
8 B 3 2.0
9 B -1 3.0