Home > Mobile >  how to create a windowed mean column in pandas
how to create a windowed mean column in pandas

Time:11-18

I am trying to compute the mean of every n rows for a pandas column and to place that information in a new column next to the original rows. It looks like this:

import pandas as pd
import numpy as np
from random import randint

data = [randint(0, 9) for p in range(0, 20)]
time = np.arange(0,20,1)
for_df = list(zip(time, data))

df = pd.DataFrame(for_df, columns=['time', 'data'])

regionSize = 5
df['regional_mean'] = (
    df['data'].apply(pd.Series)
        .groupby(df.index // regionSize)
        .transform('mean')
        .apply(list, axis=1)
    )
print(df)
    time  data regional_mean
0      0     2         [5.0]
1      1     3         [5.0]
2      2     6         [5.0]
3      3     8         [5.0]
4      4     6         [5.0]
5      5     3         [4.6]
6      6     2         [4.6]
7      7     6         [4.6]
8      8     5         [4.6]
9      9     7         [4.6]
10    10     6         [4.8]
11    11     9         [4.8]
12    12     0         [4.8]
13    13     0         [4.8]
14    14     9         [4.8]
15    15     6         [4.4]
16    16     2         [4.4]
17    17     3         [4.4]
18    18     7         [4.4]
19    19     4         [4.4]

Instead of creating fixed windows every n rows, I want to create an overlapping window of every n rows. For example, lets stick with a region size of 5. I would like to do the following:

  • Compute mean of the previous 2 and next 2 values and display that mean value next to the original value.
  • I would also like to be able to experiment with the position of the region. For example, sticking with the region size of 5, I would like to compute the mean of the previous 1 and next 3 values, and any other combination.

Basically, I want to use the pd.rolling function, but to allow the window to access future, not just previous data.

CodePudding user response:

rolling has a parameter center if you want your window to be centered at current value:

df['regional_mean'] = df['data'].rolling(regionSize, center=True).mean()

If you want to chage position you can do the same followed by a shift. For example, to compute the mean of the previous 1 and next 3 values:

df['regional_mean'] = df['data'].rolling(regionSize, center=True).mean().shift(-1)

CodePudding user response:

You can use pandas.DataFrame.rolling

df["regional_mean"]=(df["data"].rolling(5,center=True).sum())/5
df["regional_mean"].fillna(5,inplace=True)

pandas.DataFrame.rolling
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html

You get nan values for the first positions, which I have filled with 5 (I think that is what you did in your example)

  • Related