Home > database >  Find the number of elements in a DataFrame in the last 30 minutes
Find the number of elements in a DataFrame in the last 30 minutes

Time:03-14

I have a dataframe that contains for a specific timestamp, the number of items on a specific event.

matrix = pd.DataFrame()
matrix['Timestamp'] = [Timestamp('2019-12-01 11:25:32.060000'),
 Timestamp('2019-12-01 15:24:38.527000'),
 Timestamp('2019-12-02 17:09:57.907000'),
 Timestamp('2019-12-02 18:06:38.883000'),
 Timestamp('2019-12-02 19:16:43.667000'),
 Timestamp('2019-12-03 20:59:10.727000')]
matrix['N° Items'] = [1,2,1,4,5,7]
matrix

enter image description here

I want to calcule for every row, the sum of the number of items in the last 30 minutes. I solved the problem with the following:

def in_Range(delta):
    seconds = delta.total_seconds()
    return (seconds>0) and (seconds<60*30)

def func(ts):
    return matrix[(ts-matrix['Timestamp']).apply(in_Range)]['N° Items'].sum()

matrix['N° Items Last 30 Minutes'] = matrix['Timestamp'].apply(func)

I would like to know if there are other (hopefully better) ways to solve this problem. This solution is very slow (even using multiprocessing) and my real dataframe is big.

CodePudding user response:

DataFrame.rolling is what you are looking for. The function only works if your dataframe's index is a Timestamp series:

result = (
    matrix.set_index("Timestamp")
    .assign(**{
        "N° Items Last 30 Minutes": lambda x: x["N° Items"].rolling("30T").sum() - x["N° Items"]
    })
)
  • Related