Average of rows with ranges in Pandas dataframe-CodePudding

For the following dataframe

              AA         BB        CC        DD
0   4.456648e 02  36.120182  1.707122  0.332993
1   3.974615e 02   8.733798  0.346957  0.332993
2   4.750258e 00   5.197949  0.365944  0.332993

I want to compute the average of rows with ranges as described here. For example, I wrote:

df['mean1'] = df.iloc[:, 0:1].mean(axis=1)
df['mean2'] = df.iloc[:, 2:3].mean(axis=1)

So, mean1 is averages for AA and BB and mean2 is the average of CC and DD. But it isn't as you see below:

              AA         BB        CC        DD         mean1     mean2
0   4.456648e 02  36.120182  1.707122  0.332993  4.456648e 02  1.707122
1   3.974615e 02   8.733798  0.346957  0.332993  3.974615e 02  0.346957
2   4.750258e 00   5.197949  0.365944  0.332993  4.750258e 00  0.365944

How to fix that?

CodePudding user response：

Python slice end values are exclusive, not inclusive, so you are effectively taking the mean of AA and CC. You need to increase your end values by 1 to include BB and DD in each mean:

df['mean1'] = df.iloc[:, 0:2].mean(axis=1)
df['mean2'] = df.iloc[:, 2:4].mean(axis=1)

Output:

           AA         BB        CC        DD       mean1     mean2
0  445.664800  36.120182  1.707122  0.332993  240.892491  1.020058
1  397.461500   8.733798  0.346957  0.332993  203.097649  0.339975
2    4.750258   5.197949  0.365944  0.332993    4.974104  0.349468

CodePudding user response：

In python, slices have the syntax start:stop:step, where the result goes from start (inclusive) up to (but not including) stop, increments of step. The third argument is 1 by default.

In your case, the row slice 0:1 includes only row 0. Similarly, 2:3 includes only row 2.