I have a df which looks like this:
A B C
5.1 1.1 7.3
5.0 0.3 7.2
4.9 1.7 7.0
10.2 1.1 7.9
10.3 1.0 7.0
15.4 2.0 7.1
15.1 1.0 7.3
0.0 0.9 7.3
0.0 1.3 7.9
0.0 0.5 7.5
-5.1 1.0 7.3
-10.3 0.8 7.3
-10.1 1.0 7.1
I want to detect the range from column "A" and get the mean and std for all the columns and save the result in a new df.
Expected Output:
mean_A Std_A mean_B Std_B mean_C Std_C
5.0 ... 1.03 ... 7.17 ...
10.25 ... 1.05 ... 7.45 ...
... ... ... ... ... ...
So, I want to get the average from group of data based on column "A".
I am new to Python and SO. I hope I was able to explain my goal.
CodePudding user response:
Groups are defined by difference of values in A
is greater like 5
, pass to GroupBy.agg
and aggregate mean
with std
:
df = df.groupby(df.A.diff().abs().gt(5).cumsum()).agg(['mean','std'])
df.columns = df.columns.map(lambda x: f'{x[1]}_{x[0]}')
print (df)
mean_A std_A mean_B std_B mean_C std_C
A
0 5.00 0.100000 1.033333 0.702377 7.166667 0.152753
1 10.25 0.070711 1.050000 0.070711 7.450000 0.636396
2 15.25 0.212132 1.500000 0.707107 7.200000 0.141421
3 0.00 0.000000 0.900000 0.400000 7.566667 0.305505
4 -5.10 NaN 1.000000 NaN 7.300000 NaN
5 -10.20 0.141421 0.900000 0.141421 7.200000 0.141421