Home > Software design >  Cumcount and gropuby with multiple columns
Cumcount and gropuby with multiple columns

Time:09-21

I have a dataframe:

  id    value    Date
0  a       w     2022-09-30
1  a       l     2022-08-01
2  a       l     2022-07-15
3  a       w     2022-07-11
4  a       w     2022-06-12
5  a       w     2022-06-11
6  a       l     2022-06-10
7  b       l     2022-07-18
8  b       w     2022-07-14
9  b       l     2022-07-11
10 b       w     2022-07-01

when I do df.groupby("id").cumcount(ascending=False) it returns:

0  6       
1  5       
2  4       
3  3       
4  2       
5  1       
6  0
7  3
8  2    
9  1    
10 0    

However, I want to do cumcount the number of w and l respectively for each id, so the desired output would look like

  id    value    Date        cumcount_w  cumcount_l
0  a       w     2022-09-30  3           3
1  a       l     2022-08-01  3           2
2  a       l     2022-07-15  3           1
3  a       w     2022-07-11  2           1
4  a       w     2022-06-12  1           1  # there is 1 w and 1 l before this date
5  a       w     2022-06-11  0           1  # there is 1 l before this date
6  a       l     2022-06-10  0           0  # both start with zero
7  b       l     2022-07-18  2           1
8  b       w     2022-07-14  1           1
9  b       l     2022-07-11  1           0
10 b       w     2022-07-01  0           0  

How to do that with cumcount function?

CodePudding user response:

This seems to work (I just treated cumcount_w):

df["is_w"] = df["value"].apply(lambda x: int(x == "w"))

df["cumsum_w"] = df.groupby("id")["is_w"].cumsum()

df_group = df.groupby("id")["cumsum_w"].max().reset_index().rename(columns={"cumsum_w":"max_count_group_w"})

df = pd.merge(left=df, right=df_group, left_on="id", right_on="id")

df["cumcount_w"] = df["max_count_group_w"] - df["cumsum_w"]
  • Related