I am trying to compute the mean of every n rows for a pandas column and to place that information in a new column next to the original rows. It looks like this:
import pandas as pd
import numpy as np
from random import randint
data = [randint(0, 9) for p in range(0, 20)]
time = np.arange(0,20,1)
for_df = list(zip(time, data))
df = pd.DataFrame(for_df, columns=['time', 'data'])
regionSize = 5
df['regional_mean'] = (
df['data'].apply(pd.Series)
.groupby(df.index // regionSize)
.transform('mean')
.apply(list, axis=1)
)
print(df)
time data regional_mean
0 0 2 [5.0]
1 1 3 [5.0]
2 2 6 [5.0]
3 3 8 [5.0]
4 4 6 [5.0]
5 5 3 [4.6]
6 6 2 [4.6]
7 7 6 [4.6]
8 8 5 [4.6]
9 9 7 [4.6]
10 10 6 [4.8]
11 11 9 [4.8]
12 12 0 [4.8]
13 13 0 [4.8]
14 14 9 [4.8]
15 15 6 [4.4]
16 16 2 [4.4]
17 17 3 [4.4]
18 18 7 [4.4]
19 19 4 [4.4]
Instead of creating fixed windows every n rows, I want to create an overlapping window of every n rows. For example, lets stick with a region size of 5. I would like to do the following:
- Compute mean of the previous 2 and next 2 values and display that mean value next to the original value.
- I would also like to be able to experiment with the position of the region. For example, sticking with the region size of 5, I would like to compute the mean of the previous 1 and next 3 values, and any other combination.
Basically, I want to use the pd.rolling function, but to allow the window to access future, not just previous data.
CodePudding user response:
rolling
has a parameter center
if you want your window to be centered at current value:
df['regional_mean'] = df['data'].rolling(regionSize, center=True).mean()
If you want to chage position you can do the same followed by a shift. For example, to compute the mean of the previous 1 and next 3 values:
df['regional_mean'] = df['data'].rolling(regionSize, center=True).mean().shift(-1)
CodePudding user response:
You can use pandas.DataFrame.rolling
df["regional_mean"]=(df["data"].rolling(5,center=True).sum())/5
df["regional_mean"].fillna(5,inplace=True)
pandas.DataFrame.rolling
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
You get nan values for the first positions, which I have filled with 5 (I think that is what you did in your example)