Home > Blockchain >  Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value
Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value

Time:03-29

Let's say I have a df

pd.DataFrame(
    {'name':['pam','pam','bob','bob','pam','bob','pam','bob'],
     'game_id':[0,0,1,1,0,2,1,2]
     }
    )


   name game_id
0   pam 0
1   pam 0
2   bob 1
3   bob 1
4   pam 0
5   bob 2
6   pam 1
7   bob 2

I want to calculate how many games bob and amy have appeared in cumulatively. However, when I use .groupby() and .cumcount() 1, I get something different. I get a cumulative count within each game_id:

df['games'] = df.groupby(['name','game_id']).cumcount() 1

    name game_id games
0   pam 0   1
1   pam 0   2
2   bob 1   1
3   bob 1   2
4   pam 0   3
5   bob 2   1
6   pam 1   1
7   bob 2   2

When what I really want is a one total cumulative count rather than a cumulative count for each unique game_id. Here's an example of my desired output:

    name game_id games
0   pam 0   1
1   pam 0   1
2   bob 1   1
3   bob 1   1
4   pam 0   1
5   bob 2   2
6   pam 1   2
7   bob 2   2

Note, in my actual dataset game_id is a random sequence of numbers.

CodePudding user response:

Lets try sort df, check consecutive difference, create new group by cumsum and then resort the df

new_df=df.sort_values(by=['name','game_id'])
new_df=new_df.assign(rank=new_df['game_id']!=new_df['game_id'].shift())
new_df=new_df.assign(rank=new_df.groupby('name')['rank'].cumsum()).sort_index()
print(new_df)



 name  game_id  rank
0  pam        0     1
1  pam        0     1
2  bob        1     1
3  bob        1     1
4  pam        0     1
5  bob        2     2
6  pam        1     2
7  bob        2     2
  • Related