Home > front end >  Rolling function in pandas with condition
Rolling function in pandas with condition

Time:04-27

I have a dataframe with the following structure:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        "date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04"] * 2,
        "group": ["A", "A", "A", "A", "B", "B", "B", "B"],
        "x": [1, 2, 2, 3, 2, 3, 4, 2],
        "condition": [1, 0, 1, 0] * 2
    }
)
df

I want to calculate, the rolling average of the column x:

  • Per group
  • Using only past data (not using the current row)
  • Using only data for the rolling average where condition = 1.

The outcome should be the following:

enter image description here

How can I do that in pandas? Thanks!

CodePudding user response:

I think we should filter the dataframe on conditions and then calculate the mean of x

  • group == group of current row
  • date < date of current row
  • condition == 1

df.apply is used to apply to all rows of the dataframe

df['rolling_avg_x'] = df.apply(lambda x: df[(df.group == x.group) & (df.date < x.date) & (df.condition == 1)].x.mean(), axis=1)

This will give you the output as desire

  • Related