I have a dataframe with four columns. In the column 'Intensity' there are 3 groups (0, 50, 100). I would like to average only the last 2 values of column Value over the 3 groups of column 'Intensity'. Then I would like to make a new dataframe with the columns 'Replication', 'Regime', 'Intensity', 'Value_mean' and 'Value_sd', the last two being the calculate average and the standard deviation.
Replication Regime Intensity Value
1 Ctrl 0 2
1 Ctrl 0 3
1 Ctrl 0 4
1 Ctrl 0 5
1 Ctrl 0 6
1 Ctrl 0 7
1 Ctrl 50 1
1 Ctrl 50 2
1 Ctrl 50 2
1 Ctrl 50 4
1 Ctrl 50 6
1 Ctrl 50 6
1 Ctrl 100 2
1 Ctrl 100 1
1 Ctrl 100 0
2 Ctrl 100 3
2 Ctrl 0 7
2 Ctrl 0 3
2 Ctrl 0 6
2 Ctrl 0 2
2 Ctrl 0 1
2 Ctrl 0 5
2 Ctrl 50 12
2 Ctrl 50 22
2 Ctrl 50 52
2 Ctrl 50 22
2 Ctrl 50 2
2 Ctrl 50 2
2 Ctrl 100 22
2 Ctrl 100 2
2 Ctrl 100 25
So far I used the the function apply, but I don't get a dataframe but a series
df2 = df1.groupby(['Regime','Intensity']).apply(lambda x:
x.tail(3).mean(axis=0,level=0))
and I get
Intensity A
Regime Intensity
Ctrl 0 0 -0.87
50 50 2.08
100 100 4.84
CodePudding user response:
Use DataFrame.tail
in first step and then create new columns by GroupBy.transform
:
df2 = df1.groupby(['Regime','Intensity']).tail(3).copy()
df2['mean_val'] = df2.groupby('Regime')['Value'].transform('mean')
df2['std_val'] = df2.groupby('Regime')['Value'].transform('std')
print (df2)
Replication Regime Intensity Value mean_val std_val
19 2 Ctrl 0 2 9.222222 10.425663
20 2 Ctrl 0 1 9.222222 10.425663
21 2 Ctrl 0 5 9.222222 10.425663
25 2 Ctrl 50 22 9.222222 10.425663
26 2 Ctrl 50 2 9.222222 10.425663
27 2 Ctrl 50 2 9.222222 10.425663
28 2 Ctrl 100 22 9.222222 10.425663
29 2 Ctrl 100 2 9.222222 10.425663
30 2 Ctrl 100 25 9.222222 10.425663