For the following dataframe
AA BB CC DD
0 4.456648e 02 36.120182 1.707122 0.332993
1 3.974615e 02 8.733798 0.346957 0.332993
2 4.750258e 00 5.197949 0.365944 0.332993
I want to compute the average of rows with ranges as described here. For example, I wrote:
df['mean1'] = df.iloc[:, 0:1].mean(axis=1)
df['mean2'] = df.iloc[:, 2:3].mean(axis=1)
So, mean1
is averages for AA and BB and mean2
is the average of CC and DD. But it isn't as you see below:
AA BB CC DD mean1 mean2
0 4.456648e 02 36.120182 1.707122 0.332993 4.456648e 02 1.707122
1 3.974615e 02 8.733798 0.346957 0.332993 3.974615e 02 0.346957
2 4.750258e 00 5.197949 0.365944 0.332993 4.750258e 00 0.365944
How to fix that?
CodePudding user response:
Python slice end
values are exclusive, not inclusive, so you are effectively taking the mean of AA
and CC
. You need to increase your end
values by 1 to include BB
and DD
in each mean:
df['mean1'] = df.iloc[:, 0:2].mean(axis=1)
df['mean2'] = df.iloc[:, 2:4].mean(axis=1)
Output:
AA BB CC DD mean1 mean2
0 445.664800 36.120182 1.707122 0.332993 240.892491 1.020058
1 397.461500 8.733798 0.346957 0.332993 203.097649 0.339975
2 4.750258 5.197949 0.365944 0.332993 4.974104 0.349468
CodePudding user response:
In python, slices have the syntax start:stop:step
, where the result goes from start (inclusive) up to (but not including) stop, increments of step. The third argument is 1 by default.
In your case, the row slice 0:1
includes only row 0. Similarly, 2:3
includes only row 2.