Home > Enterprise >  Pandas forward looking rolling count in groupby to count until last row in group
Pandas forward looking rolling count in groupby to count until last row in group

Time:09-10

I have the following sample data set that contains my expected output column. I have empty NaN values in my data so added 'NaN' as an example.

 data={'Store':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,10,11,12],
'CopyCheck':[10,10,np.nan,np.nan,10,10,10,np.nan,10,np.nan,10,10,10,10,np.nan,np.nan,10,10,10],
'Expected Next3CopyCount':[2,1,1,2,3,2,1,1,2,2,3,3,2,1,1,2,3,2,1]}
df=pd.DataFrame.from_dict(data)

I have tried the below code do a forward looking rolling count, to count the non-empty rows in the dataframe (including the current row). When it gets to the end of each group where I don't have 3 rows, my code returns 0 or NaN instead of adding up the rows if less then 3. How can I adjust my code to get the result I am looking for

EDIT: I also noticed that the rolling count rolls into the next group and adds a false count for the last 3 rows of Store 1. Why does the groupby not work on Store?

df['Next3Copies'] = df['CheckCopy'].groupby(df['Store']).rolling(3,min_periods=1).count().shift(-2).values

CodePudding user response:

One way to do rolling forward is to reverse the data with iloc[::-1], then it is pretty much what you did for the groupby.rolling.count. To be able to assign the values back, you can sort_index the data with the original index and get the array of values.

df['exp'] = (
    df.iloc[::-1] # reverse the data to do rolling forward
      .groupby(df['Store']).rolling(3,min_periods=1)
      ['CopyCheck'].count()
      .sort_index(level=1) # get original order
      .to_numpy().astype(int) #now you can get values
)
print(df)
#     Store  Week  CopyCheck  Expected Next3CopyCount  exp
# 0       1     1       10.0                        2    2
# 1       1     2       10.0                        1    1
# 2       1     3        NaN                        1    1
# 3       1     4        NaN                        2    2
# 4       1     5       10.0                        3    3
# 5       1     6       10.0                        2    2
# 6       1     7       10.0                        1    1
# 7       2     1        NaN                        1    1
# 8       2     2       10.0                        2    2
# 9       2     3        NaN                        2    2
# 10      2     4       10.0                        3    3
# 11      2     5       10.0                        3    3
  • Related