Home > OS >  cumcount based on certain values sorted by date in Pandas dataframe
cumcount based on certain values sorted by date in Pandas dataframe

Time:09-02

I have the a dataframe df that looks like sorted by ID and Date in ascending and descending order respectively:

    ID  Date        Place
0   1   2021-04-28  1
1   1   2021-02-28  2
2   1   2020-12-23  11
3   1   2020-11-29  1
4   2   2021-07-07  1
5   2   2021-06-20  4
6   2   2021-05-26  8
7   2   2021-04-08  1
8   2   2021-03-03  3
9   2   2021-02-03  1
10  2   2021-01-13  9
11  2   2020-12-23  12
12  3   2021-06-02  1
13  3   2021-05-08  1
14  3   2021-04-08  9
15  3   2021-01-17  1
16  3   2020-12-23  4
17  3   2020-12-02  1
18  3   2020-11-14  2

I would like to add two new columns called Number of 1 which is defined to be cumulative count of number 1 occurred in Place for each ID and another column called Recent Number of 1 which is defined to be the cumcount of 1's in Place but the count restarts every year. So the outcome looks like

    ID  Date        Place  Number of 1  Recent Number of 1
0   1   2021-04-28  1      2            1
1   1   2021-02-28  2      1            0            
2   1   2020-12-23  11     1            1
3   1   2020-11-29  1      1            1
4   2   2021-07-07  1      3            3
5   2   2021-06-20  4      2            2
6   2   2021-05-26  8      2            2
7   2   2021-04-08  1      2            2
8   2   2021-03-03  3      1            1
9   2   2021-02-03  1      1            1
10  2   2021-01-13  9      0            0
11  2   2020-12-23  12     0            0
12  3   2021-06-02  1      4            2
13  3   2021-05-08  1      3            1
14  3   2021-04-08  9      2            1
15  3   2021-01-17  1      2            1
16  3   2020-12-23  4      1            1
17  3   2020-12-02  1      1            1
18  3   2020-11-14  2      0            0

I think for the recent number it should have something to do with

df.sort_values(by = ['ID', 'Date'], ascending = [True, False], inplace = True)

df['Date'] = df['Date'].dt.year

df.groupby(['ID','Date']).cumcount something Thank you so much for your help.

CodePudding user response:

here is one way to do it

the result don't match the expected outcome, as the first table in question don't match the table under expected.

The below is from the first table in the question

df['Date']=pd.to_datetime(df['Date'])
df['number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby(df['ID']).cumsum()

df['Recent number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby([df['ID'],df['Date'].dt.year]).cumsum()
df
    ID  Date    Place   number of 1     Recent number of 1
0   1   2021-04-28  8   2   1
1   1   2021-02-28  1   2   1
2   1   2020-12-23  11  1   1
3   1   2020-11-29  1   1   1
4   2   2021-07-07  1   3   3
5   2   2021-06-20  4   2   2
6   2   2021-05-26  8   2   2
7   2   2021-04-08  1   2   2
8   2   2021-03-03  3   1   1
9   2   2021-02-03  1   1   1
10  2   2021-01-13  9   0   0
11  2   2020-12-23  12  0   0
12  3   2021-06-02  1   3   2
13  3   2021-05-08  8   2   1
14  3   2021-04-08  9   2   1
15  3   2021-01-17  1   2   1
16  3   2020-12-23  4   1   1
17  3   2020-12-02  7   1   1
18  3   2020-11-14  1   1   1
  • Related