Pandas Count number of previous rows based on column value and condition-CodePudding

In Pandas I am trying create a column that counts all previous rows based on two column values.

The idea is to sort the data by animal and by date and then use pandas shift function to calculate the variance between an animals current date and previous date(the sample data below already has this done).

Then I want to create a new column called Previous_Count that is a rolling count of previous records based on value in Animal column and resets if the Date_Variance field is larger than 10.

Sample Data Code:

import pandas as pd

data = {'Date': ['2021-01-01','2021-01-05','2021-01-10','2021-01-11','2021-01-04','2021-01-08','2021-01-20','2021-01-21','2021-01-25','2021-02-12','2021-02-14'],
        'Date_Variance': [0,4,5,1,0,4,0,1,4,18,2],
        'Animal': ['Dog','Dog','Dog','Dog','Cat','Cat','Lizard','Lizard','Lizard','Lizard','Lizard'],
        #'Previous_count': [0,1,2,3,0,1,0,1,2,0,1]
        }

df = pd.DataFrame(data)

Sample Data Output with Previous_Count column:

Date       Date_Variance    Animal  Previous_count
1/1/2021    0               Dog     0
1/5/2021    4               Dog     1
1/10/2021   5               Dog     2
1/11/2021   1               Dog     3
1/4/2021    0               Cat     0
1/8/2021    4               Cat     1
1/20/2021   0               Lizard  0
1/21/2021   1               Lizard  1
1/25/2021   4               Lizard  2
2/12/2021   18              Lizard  0
2/14/2021   2               Lizard  1

CodePudding user response：

Try:

grouper = df.groupby("Animal")["Date_Variance"].apply(lambda x: x.gt(10).cumsum())
df["Previous_count"] = df.groupby(["Animal", grouper]).cumcount()

>>> df
          Date  Date_Variance  Animal  Previous_count
0   2021-01-01              0     Dog               0
1   2021-01-05              4     Dog               1
2   2021-01-10              5     Dog               2
3   2021-01-11              1     Dog               3
4   2021-01-04              0     Cat               0
5   2021-01-08              4     Cat               1
6   2021-01-20              0  Lizard               0
7   2021-01-21              1  Lizard               1
8   2021-01-25              4  Lizard               2
9   2021-02-12             18  Lizard               0
10  2021-02-14              2  Lizard               1

CodePudding user response：

Use GroupBy.cumcount. Here we use Series.cumsum to get a new group whenever Date_Variance is greater than 10

df['Previous_count'] = df.groupby(['Animal',
                                   df['Date_Variance'].gt(10).cumsum()]).cumcount()

print(df)

          Date  Date_Variance  Animal  Previous_count
0   2021-01-01              0     Dog               0
1   2021-01-05              4     Dog               1
2   2021-01-10              5     Dog               2
3   2021-01-11              1     Dog               3
4   2021-01-04              0     Cat               0
5   2021-01-08              4     Cat               1
6   2021-01-20              0  Lizard               0
7   2021-01-21              1  Lizard               1
8   2021-01-25              4  Lizard               2
9   2021-02-12             18  Lizard               0
10  2021-02-14              2  Lizard               1

print(df['Date_Variance'].gt(10).cumsum())
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     1
10    1
Name: Date_Variance, dtype: int64