Home > Back-end >  Average of rows with ranges in Pandas dataframe
Average of rows with ranges in Pandas dataframe

Time:06-26

For the following dataframe

              AA         BB        CC        DD
0   4.456648e 02  36.120182  1.707122  0.332993
1   3.974615e 02   8.733798  0.346957  0.332993
2   4.750258e 00   5.197949  0.365944  0.332993

I want to compute the average of rows with ranges as described here. For example, I wrote:

df['mean1'] = df.iloc[:, 0:1].mean(axis=1)
df['mean2'] = df.iloc[:, 2:3].mean(axis=1)

So, mean1 is averages for AA and BB and mean2 is the average of CC and DD. But it isn't as you see below:

              AA         BB        CC        DD         mean1     mean2
0   4.456648e 02  36.120182  1.707122  0.332993  4.456648e 02  1.707122
1   3.974615e 02   8.733798  0.346957  0.332993  3.974615e 02  0.346957
2   4.750258e 00   5.197949  0.365944  0.332993  4.750258e 00  0.365944

How to fix that?

CodePudding user response:

Python slice end values are exclusive, not inclusive, so you are effectively taking the mean of AA and CC. You need to increase your end values by 1 to include BB and DD in each mean:

df['mean1'] = df.iloc[:, 0:2].mean(axis=1)
df['mean2'] = df.iloc[:, 2:4].mean(axis=1)

Output:

           AA         BB        CC        DD       mean1     mean2
0  445.664800  36.120182  1.707122  0.332993  240.892491  1.020058
1  397.461500   8.733798  0.346957  0.332993  203.097649  0.339975
2    4.750258   5.197949  0.365944  0.332993    4.974104  0.349468

CodePudding user response:

In python, slices have the syntax start:stop:step, where the result goes from start (inclusive) up to (but not including) stop, increments of step. The third argument is 1 by default.

In your case, the row slice 0:1 includes only row 0. Similarly, 2:3 includes only row 2.

  • Related