I have a dataframe:
id value Date
0 a w 2022-09-30
1 a l 2022-08-01
2 a l 2022-07-15
3 a w 2022-07-11
4 a w 2022-06-12
5 a w 2022-06-11
6 a l 2022-06-10
7 b l 2022-07-18
8 b w 2022-07-14
9 b l 2022-07-11
10 b w 2022-07-01
when I do df.groupby("id").cumcount(ascending=False)
it returns:
0 6
1 5
2 4
3 3
4 2
5 1
6 0
7 3
8 2
9 1
10 0
However, I want to do cumcount
the number of w and l respectively for each id
, so the desired output would look like
id value Date cumcount_w cumcount_l
0 a w 2022-09-30 3 3
1 a l 2022-08-01 3 2
2 a l 2022-07-15 3 1
3 a w 2022-07-11 2 1
4 a w 2022-06-12 1 1 # there is 1 w and 1 l before this date
5 a w 2022-06-11 0 1 # there is 1 l before this date
6 a l 2022-06-10 0 0 # both start with zero
7 b l 2022-07-18 2 1
8 b w 2022-07-14 1 1
9 b l 2022-07-11 1 0
10 b w 2022-07-01 0 0
How to do that with cumcount function?
CodePudding user response:
This seems to work (I just treated cumcount_w):
df["is_w"] = df["value"].apply(lambda x: int(x == "w"))
df["cumsum_w"] = df.groupby("id")["is_w"].cumsum()
df_group = df.groupby("id")["cumsum_w"].max().reset_index().rename(columns={"cumsum_w":"max_count_group_w"})
df = pd.merge(left=df, right=df_group, left_on="id", right_on="id")
df["cumcount_w"] = df["max_count_group_w"] - df["cumsum_w"]