Home > Back-end >  How to get the mean values per different group of rows with pandas?
How to get the mean values per different group of rows with pandas?

Time:12-22

I have an ascii file as following

          7.00000000              5.61921453
         18.00000000              9.75818253
         13.00000000             37.94074631
         18.00000000             29.54162407
         10.00000000             18.82115364
         13.00000000             15.00485802
         16.00000000             19.24893761
         20.00000000             22.59035683
         17.00000000             59.69598007
         17.00000000             34.07574844
         18.00000000             24.17820358
         13.00000000             24.70093536
         11.00000000             23.37569046
         14.00000000             34.14352036
         13.00000000             33.33922577
         16.00000000             36.64311981
         20.00000000             60.21446609
         20.00000000             33.54150391
         18.00000000             40.84828949
         21.00000000             40.31245041
         34.00000000             91.71004486
         40.00000000             93.24317169
         42.00000000             43.94712067
         12.00000000             32.73310471
          7.00000000             25.25534248
          9.00000000             23.14623833

I want to calculate (for both columns, separately) the mean values of the first 10 rows, then the next 11 rows, then the next 5 rows so as to get the following output

14.9    25.2296802
18  40.2734046
22  43.6649956

How I could do that in python with pandas? In case I would have a stable group of rows (e.g. per 10 rows) I would do the following

df = pd.read_csv(i,sep='\t',header=None)
df_mean=df.groupby(np.arange(len(df))//10).mean()

CodePudding user response:

Use numpy.repeat to craft groups (here a/b/c) with arbitrary lengths:

import numpy as np
means = df.groupby(np.repeat(['a', 'b', 'c'], [10, 11, 5])).mean()

output:

      0          1
a  14.9  25.229680
b  18.0  40.273405
c  22.0  43.664996

If you don't care about group names:

groups = [10, 11, 5]
means = df.groupby(np.repeat(np.arange(len(groups)), groups)).mean()

output:

      0          1
0  14.9  25.229680
1  18.0  40.273405
2  22.0  43.664996
  • Related