Home > Mobile >  More elegant way to cumcount based on a condition in python pandas
More elegant way to cumcount based on a condition in python pandas

Time:11-24

I have the following df consisting of two teams and their point differentials for games they have played.

df = pd.DataFrame({
    'Team':['A','A','A','A','A','B','B','B','B','B'],
    'Point_Diff':[-4,5,-1,2,2,6,-5,-4,3,-1]}
    )
df


  Team  Point_Diff
0   A   -4
1   A   5
2   A   -1
3   A   2
4   A   2
5   B   6
6   B   -5
7   B   -4
8   B   3
9   B   -1

I want to create a variable Losses which is a cumulative count only where Point_Diff is negative (since that means the team lost).

I have been able to do this, albeit kind of hacky and in multiple lines:

df['Losses'] = df.loc[df.Point_Diff<0].groupby(['Team'])[['Point_Diff']].transform('cumcount') 1

df.Losses = df.Losses.fillna(method='ffill')

df


  Team  Point_Diff  Losses
0   A   -4  1.0
1   A   5   1.0
2   A   -1  2.0
3   A   2   2.0
4   A   2   2.0
5   B   6   2.0
6   B   -5  1.0
7   B   -4  2.0
8   B   3   2.0
9   B   -1  3.0

There has to be a more elegant way to do this, likely with shift() or something, but was unable to figure anything out in terms of incorporating it with a groupby(). I would assume there has to be a one liner that can do this and would appreciate if anyone could provide input. Again, I have solved my initial task but now i just wish to clean up the code a bit.

EDIT: I'm also noticing my code doesn't even work. There is an error when team B starts in row 5. Losses should be 0, not 2. Not sure why that is.

CodePudding user response:

Not sure if this is what you have in mind:

df.assign(Losses=df.Point_Diff.lt(0).groupby('Team').cumsum())

  Team  Point_Diff  Losses
0    A          -4       1
1    A           5       1
2    A          -1       2
3    A           2       2
4    A           2       2
5    B           6       0
6    B          -5       1
7    B          -4       2
8    B           3       2
9    B          -1       3

CodePudding user response:

It can be shortened:

df = df.assign(Losses=df[df.Point_Diff<0].groupby(['Team'])[['Point_Diff']].cumcount().add(1)).ffill()

Output:

>>> df
  Team  Point_Diff  Losses
0    A          -4     1.0
1    A           5     1.0
2    A          -1     2.0
3    A           2     2.0
4    A           2     2.0
5    B           6     2.0 # <-- Still 2, though.
6    B          -5     1.0
7    B          -4     2.0
8    B           3     2.0
9    B          -1     3.0
  • Related