Home > Enterprise >  Group by categorical column and by range of ids
Group by categorical column and by range of ids

Time:08-09

I have a dataframe similar to this one:

     col  inc_idx
0    A    1
1    B    1
2    C    1
3    A    2
4    A    3
5    B    2
6    D    1
7    E    1
8    F    1
9    F    2
10   Z    1

And I'm trying to iterate the df by batches:

First loop: All col rows with inc_idx >= 1 and inc_idx <=2

A 1
A 2
B 1
B 2
...

Second loop: All col rows with inc_idx >= 3 and inc_idx <=4

A 3

The way I'm doing it now leaves a lot of room for improvement:

i = 0
while True:
    for col, grouped_rows in df.groupby(by=['col']):
        from_idx = i * 2
        to_idx = from_idx   2
        items = grouped_rows .iloc[from_idx:to_idx].to_list()
    i  = 2

I think that there's got to be a more efficient approach and also a way to remove the "while True" loop and instead just waiting for the internal loop to run out of items.

CodePudding user response:

I don't know exactly what you want to do. Here's something that groups the rows.

df.groupby((df.inc_idx   1) // 2).agg(list)
                                    col                         inc_idx
inc_idx                                                                
1        [A, B, C, A, B, D, E, F, F, Z]  [1, 1, 1, 2, 2, 1, 1, 1, 2, 1]
2                                   [A]                             [3]

CodePudding user response:

I've found (I think) a simpler way to solve it. I'll add a new "batch" column:

df['batch'] = df.apply(lambda x: x['inc_idx'] // 2, axis=1)

With this new column, now I can simply do something like:

df.groupby(by=['col', 'batch'])
  • Related