I have a dataframe that contains for a specific timestamp, the number of items on a specific event.
matrix = pd.DataFrame()
matrix['Timestamp'] = [Timestamp('2019-12-01 11:25:32.060000'),
Timestamp('2019-12-01 15:24:38.527000'),
Timestamp('2019-12-02 17:09:57.907000'),
Timestamp('2019-12-02 18:06:38.883000'),
Timestamp('2019-12-02 19:16:43.667000'),
Timestamp('2019-12-03 20:59:10.727000')]
matrix['N° Items'] = [1,2,1,4,5,7]
matrix
I want to calcule for every row, the sum of the number of items in the last 30 minutes. I solved the problem with the following:
def in_Range(delta):
seconds = delta.total_seconds()
return (seconds>0) and (seconds<60*30)
def func(ts):
return matrix[(ts-matrix['Timestamp']).apply(in_Range)]['N° Items'].sum()
matrix['N° Items Last 30 Minutes'] = matrix['Timestamp'].apply(func)
I would like to know if there are other (hopefully better) ways to solve this problem. This solution is very slow (even using multiprocessing) and my real dataframe is big.
CodePudding user response:
DataFrame.rolling
is what you are looking for. The function only works if your dataframe's index is a Timestamp series:
result = (
matrix.set_index("Timestamp")
.assign(**{
"N° Items Last 30 Minutes": lambda x: x["N° Items"].rolling("30T").sum() - x["N° Items"]
})
)