I tried to make a kind of running average - out of 90 rows, every 3 in column A should make an average that would be the same as those rows in column B. For example: From this:
df = pd.DataFrame( A B
2 0
3 0
4 0
7 0
9 0
8 0)
to this:
df = pd.DataFrame( A B
2 3
3 3
4 3
7 8
9 8
8 8)
I tried running this code:
x=0
for i in df['A']:
if x<90:
y = (df['A'][x] df['A'][(x 1)] df['A'][(x 2)])/3
df['B'][x] = y
df['B'][(x 1)] = y
df['B'][(x 2)] = y
x=x 3
print(y)
It does print the correct Y But does not change B
I know there is a better way to do it, and if anyone knows - it would be great if they shared it. But the more important thing for me is to understand why what I wrote down doesn't have an effect on the df.
CodePudding user response:
You could group by the index divided by 3, then use transform
to compute the mean of those values and assign to B
:
df = pd.DataFrame({'A': [2, 3, 4, 7, 9, 8], 'B': [0, 0, 0, 0, 0, 0]})
df['B'] = df.groupby(df.index // 3)['A'].transform('mean')
Output:
A B
0 2 3
1 3 3
2 4 3
3 7 8
4 9 8
5 8 8
Note that this relies on the index being of the form 0,1,2,3,4,...
. If that is not the case, you could either reset the index (df.reset_index(drop=True)
) or use np.arange(df.shape[0])
instead i.e.
df['B'] = df.groupby(np.arange(df.shape[0]) // 3)['A'].transform('mean')
CodePudding user response:
i = 0
batch_size = 3
df = pd.DataFrame({'A':[2,3,4,7,9,8,9,10],'B':[-1] * 8})
while i < len(df):
j = min(i batch_size-1,len(df)-1)
avg =sum(df.loc[i:j,'A'])/ (j-i 1)
df.loc[i:j,'B'] = [avg] * (j-i 1)
i =batch_size
df
corner case when len(df) % batch_size != 0
assumes we take the average of the leftover rows.