I have the following sample data set that contains my expected output column. I have empty NaN values in my data so added 'NaN' as an example.
data={'Store':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
'Week':[1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,10,11,12],
'CopyCheck':[10,10,np.nan,np.nan,10,10,10,np.nan,10,np.nan,10,10,10,10,np.nan,np.nan,10,10,10],
'Expected Next3CopyCount':[2,1,1,2,3,2,1,1,2,2,3,3,2,1,1,2,3,2,1]}
df=pd.DataFrame.from_dict(data)
I have tried the below code do a forward looking rolling count, to count the non-empty rows in the dataframe (including the current row). When it gets to the end of each group where I don't have 3 rows, my code returns 0 or NaN instead of adding up the rows if less then 3. How can I adjust my code to get the result I am looking for
EDIT: I also noticed that the rolling count rolls into the next group and adds a false count for the last 3 rows of Store 1. Why does the groupby not work on Store?
df['Next3Copies'] = df['CheckCopy'].groupby(df['Store']).rolling(3,min_periods=1).count().shift(-2).values
CodePudding user response:
One way to do rolling forward is to reverse the data with iloc[::-1]
, then it is pretty much what you did for the groupby.rolling.count
. To be able to assign the values back, you can sort_index
the data with the original index and get the array of values.
df['exp'] = (
df.iloc[::-1] # reverse the data to do rolling forward
.groupby(df['Store']).rolling(3,min_periods=1)
['CopyCheck'].count()
.sort_index(level=1) # get original order
.to_numpy().astype(int) #now you can get values
)
print(df)
# Store Week CopyCheck Expected Next3CopyCount exp
# 0 1 1 10.0 2 2
# 1 1 2 10.0 1 1
# 2 1 3 NaN 1 1
# 3 1 4 NaN 2 2
# 4 1 5 10.0 3 3
# 5 1 6 10.0 2 2
# 6 1 7 10.0 1 1
# 7 2 1 NaN 1 1
# 8 2 2 10.0 2 2
# 9 2 3 NaN 2 2
# 10 2 4 10.0 3 3
# 11 2 5 10.0 3 3