I have the following Pandas dataframe in Python:
ID Date
E105 28/4/2021
E105 28/2/2021
E105 23/12/2020
E105 29/11/2020
E076 7/7/2021
E076 20/6/2021
E076 26/5/2021
E076 8/4/2021
E076 3/3/2021
E076 3/2/2021
E076 13/1/2021
E076 23/12/2020
E066 2/6/2021
E066 8/5/2021
E066 8/4/2021
E066 17/1/2021
E066 23/12/2020
E066 2/12/2020
E066 14/11/2020
sorted by ID
and Date
and I would like to make a new column counting how many times that ID occurs before that day: i.e.:
ID Date number of times before that day
E105 28/4/2021 3
E105 28/2/2021 2
E105 23/12/2020 1
E105 29/11/2020 0
E076 7/7/2021 7
E076 20/6/2021 6
E076 26/5/2021 5
E076 8/4/2021 4
E076 3/3/2021 3
E076 3/2/2021 2
E076 13/1/2021 1
E076 23/12/2020 0
E066 2/6/2021 6
E066 8/5/2021 5
E066 8/4/2021 4
E066 17/1/2021 3
E066 23/12/2020 2
E066 2/12/2020 1
E066 14/11/2020 0
I think it has something to do with groupby
but I couldn't work it out.
CodePudding user response:
As the df is already sorted by Date (in descending order), it suffices to cumulatively count the occurrences of each ID in descending order (which can be achieved via ascending=False
parameter).
df['number of times before that day'] = df.groupby('ID').cumcount(ascending=False)