I have a dataframe similar to this one:
col inc_idx
0 A 1
1 B 1
2 C 1
3 A 2
4 A 3
5 B 2
6 D 1
7 E 1
8 F 1
9 F 2
10 Z 1
And I'm trying to iterate the df by batches:
First loop: All col rows with inc_idx >= 1 and inc_idx <=2
A 1
A 2
B 1
B 2
...
Second loop: All col rows with inc_idx >= 3 and inc_idx <=4
A 3
The way I'm doing it now leaves a lot of room for improvement:
i = 0
while True:
for col, grouped_rows in df.groupby(by=['col']):
from_idx = i * 2
to_idx = from_idx 2
items = grouped_rows .iloc[from_idx:to_idx].to_list()
i = 2
I think that there's got to be a more efficient approach and also a way to remove the "while True" loop and instead just waiting for the internal loop to run out of items.
CodePudding user response:
I don't know exactly what you want to do. Here's something that groups the rows.
df.groupby((df.inc_idx 1) // 2).agg(list)
col inc_idx
inc_idx
1 [A, B, C, A, B, D, E, F, F, Z] [1, 1, 1, 2, 2, 1, 1, 1, 2, 1]
2 [A] [3]
CodePudding user response:
I've found (I think) a simpler way to solve it. I'll add a new "batch" column:
df['batch'] = df.apply(lambda x: x['inc_idx'] // 2, axis=1)
With this new column, now I can simply do something like:
df.groupby(by=['col', 'batch'])