Home > other >  Pandas: Row reduction via Groupby
Pandas: Row reduction via Groupby

Time:07-18

For an example, I have a simple DataFrame like:

index data1 replace_me agg_me ID
1 100 ( ) 25 1
2 200 (-) 35 2
3 200 ( ) 45 2
4 300 ( ) 55 3
5 400 ( ) 10 4
6 400 ( ) 10 4
7 400 (-) 10 4
8 400 (-) 10 4

I am trying to aggregate some rows together, whereby there exists a len(groupby of ID) > 1. In the cases where len(groupby ID) > 1, I am looking to:

  1. Add column "agg_me" together
  2. Replace (-) and ( ) with (=)
  3. Enter the (min(agg_me) / sum(agg_me)) into a new column called "Percent".
  4. Do such that it only "pairs" off rows, ie, it doesnt collapse 4 rows -> 1. So as a result:
index data1 replace_me agg_me ID Percent
1 100 ( ) 25 1 0
2 200 (=) 80 2 0.4375
4 300 ( ) 55 3 0
5 400 (=) 20 4 0.5
6 400 (=) 20 4 0.5

Any help is appreciated!

CodePudding user response:

Try this:

vc = df['ID'].map(df['ID'].value_counts()).gt(1)

pd.concat([df.loc[~vc],
    df.loc[vc]
    .groupby(['ID',df.groupby('ID').cumcount().floordiv(2)]).agg(
        index = ('index','first'),
        data1 = ('data1','first'),
        replace_me = ('replace_me',lambda x: '(=)'),
        agg_me = ('agg_me','sum'),
        Percent = ('agg_me',lambda x: x.min()/x.sum()))
            .reset_index(level=0)]).fillna(0).sort_values('ID').reset_index(drop=True)

Output:

   index  data1 replace_me  agg_me  ID  Percent
0      1    100        ( )      25   1   0.0000
1      2    200        (=)      80   2   0.4375
2      4    300        ( )      55   3   0.0000
3      5    400        (=)      20   4   0.5000
4      7    400        (=)      20   4   0.5000
  • Related