Home > Mobile >  Subtracting rolling mean from pandas without for loop with unevenly sampled data and no datetime
Subtracting rolling mean from pandas without for loop with unevenly sampled data and no datetime

Time:10-04

I have a dataset with a column DAY, and a column VALUE. I would like to create a new column, VALUE_SUB which is made by subtracting a rolling window mean with width N of VALUE which rolls over DAY from VALUE. So, I have a dataframe like this:

DAY,   VALUE
1      8
2      6
4      7
5      6
8      10
10     8

And I want to subtract with a rolling mean of width 3 to get:

DAY,   VALUE,   VALUE_SUB
1      8        1
2      6        -1
4      7        0.5
5      6        -0.5
8      10       1
9      8        0.33
10     5        -1.5

I can do this with the following code:

out = []
for i in range(len(list(df['DAY']))):
    df1 = df[(df['DAY'] >= df['DAY'].iloc[i] - 1) & (df['DAY'] <= df['DAY'].iloc[i]   1)]
    out  = [df['VALUE'].iloc[i] - df1['VALUE'].mean()]
df['VALUE_SUB'] = out

But my dataframe is very large so this takes forever. I would like to do it without using a for loop at all, if possible. I know pandas has a rolling mean function that works for datetime objects or for rolling over a set number of rows, but my data does not have datetime objects nor is the data evenly sampled.

CodePudding user response:

You can change the DAY column to a datetime object with pd.to_datetime and set as the index to be able to use the rolling.mean with a centered window. keep only the values (to avoid mismatch on the index) and substract from the VALUE column.

# input
df = pd.DataFrame({'DAY': [1, 2, 4, 5, 8, 9, 10], 'VALUE': [8, 6, 7, 6, 10, 8, 5]})

n_day = 3
df['VALUE_SUB'] = (
    df['VALUE'] 
    - df.set_index(pd.to_datetime(df['DAY'], unit='d'))
        ['VALUE'].rolling(f'{n_day}d', center=True).mean().to_numpy()
)
print(df)
#    DAY  VALUE  VALUE_SUB
# 0    1      8   1.000000
# 1    2      6  -1.000000
# 2    4      7   0.500000
# 3    5      6  -0.500000
# 4    8     10   1.000000
# 5    9      8   0.333333
# 6   10      5  -1.500000
  • Related