I have a multi index dataframe that looks like this:
For each item, I would like to count the number of times in the last two periods the qty has been greater than or equal to 18. The answer is written to a new df that has the item number and number of occurrences such as:
This is what I tried to do:
import pandas as pd
df = pd.DataFrame({'item':[1,1,1,2,2,3],'date':['2020-03-31','2020-06-30','2020-09-30','2021-04-15','2021-06-30','2021-06-30',],'warehouse':[10,11,12,13,14,15], 'qty':[16,17,18,19,20,21]}).set_index(['item','date'])
df
new_df = []
new_df['occurs'] = df['qty'].tail(2).ge(2).sum()
new_df
This produces an error:
TypeError: list indices must be integers or slices, not str
Can someone point me in the right direction?
CodePudding user response:
not sure why 3 gets NaN in your example, but this can be done by groupby and apply functions:
new_df = df.groupby(level=0).apply(lambda x:(x.tail(2).qty>=18).sum())
new_df
1 1
2 2
3 1