I have an ascii file as following
7.00000000 5.61921453
18.00000000 9.75818253
13.00000000 37.94074631
18.00000000 29.54162407
10.00000000 18.82115364
13.00000000 15.00485802
16.00000000 19.24893761
20.00000000 22.59035683
17.00000000 59.69598007
17.00000000 34.07574844
18.00000000 24.17820358
13.00000000 24.70093536
11.00000000 23.37569046
14.00000000 34.14352036
13.00000000 33.33922577
16.00000000 36.64311981
20.00000000 60.21446609
20.00000000 33.54150391
18.00000000 40.84828949
21.00000000 40.31245041
34.00000000 91.71004486
40.00000000 93.24317169
42.00000000 43.94712067
12.00000000 32.73310471
7.00000000 25.25534248
9.00000000 23.14623833
I want to calculate (for both columns, separately) the mean values of the first 10 rows, then the next 11 rows, then the next 5 rows so as to get the following output
14.9 25.2296802
18 40.2734046
22 43.6649956
How I could do that in python with pandas? In case I would have a stable group of rows (e.g. per 10 rows) I would do the following
df = pd.read_csv(i,sep='\t',header=None)
df_mean=df.groupby(np.arange(len(df))//10).mean()
CodePudding user response:
Use numpy.repeat
to craft groups (here a/b/c) with arbitrary lengths:
import numpy as np
means = df.groupby(np.repeat(['a', 'b', 'c'], [10, 11, 5])).mean()
output:
0 1
a 14.9 25.229680
b 18.0 40.273405
c 22.0 43.664996
If you don't care about group names:
groups = [10, 11, 5]
means = df.groupby(np.repeat(np.arange(len(groups)), groups)).mean()
output:
0 1
0 14.9 25.229680
1 18.0 40.273405
2 22.0 43.664996