I want to process each N rows of a DataFrame separately.
If my data has 15 row indexed from 0 to 14 I want to process rows from index 0 to 3 , 4 to 7, 8 to 11, 12 to 15
for example let's say for each 4 rows I want the sum(A) and the mean(B)
Index | A | B |
---|---|---|
0 | 4 | 4 |
1 | 7 | 9 |
2 | 9 | 3 |
3 | 0 | 4 |
4 | 7 | 9 |
5 | 9 | 2 |
6 | 3 | 0 |
7 | 7 | 4 |
8 | 7 | 2 |
9 | 1 | 6 |
The Resulted DataFrame should be
Index | A | B |
---|---|---|
0 | 20 | 5 |
1 | 26 | 3.75 |
2 | 8 | 4 |
TLDR: how to let DataFrame.apply
takes multiple rows instead of a single row at a time
CodePudding user response:
Use GroupBy.agg
with integer division by 4
by index:
#default RangeIndex
df = df.groupby(df.index // 4).agg({'A':'sum', 'B':'mean'})
#any index
df = df.groupby(np.arange(len(df.index)) // 4).agg({'A':'sum', 'B':'mean'})
print (df)
A B
0 20 5.00
1 26 3.75
2 8 4.00