Pandas: Row reduction via Groupby-CodePudding

For an example, I have a simple DataFrame like:

index	data1	replace_me	agg_me	ID
1	100	( )	25	1
2	200	(-)	35	2
3	200	( )	45	2
4	300	( )	55	3
5	400	( )	10	4
6	400	( )	10	4
7	400	(-)	10	4
8	400	(-)	10	4

I am trying to aggregate some rows together, whereby there exists a len(groupby of ID) > 1. In the cases where len(groupby ID) > 1, I am looking to:

Add column "agg_me" together
Replace (-) and ( ) with (=)
Enter the (min(agg_me) / sum(agg_me)) into a new column called "Percent".
Do such that it only "pairs" off rows, ie, it doesnt collapse 4 rows -> 1. So as a result:

index	data1	replace_me	agg_me	ID	Percent
1	100	( )	25	1	0
2	200	(=)	80	2	0.4375
4	300	( )	55	3	0
5	400	(=)	20	4	0.5
6	400	(=)	20	4	0.5

Any help is appreciated!

CodePudding user response：

Try this:

vc = df['ID'].map(df['ID'].value_counts()).gt(1)

pd.concat([df.loc[~vc],
    df.loc[vc]
    .groupby(['ID',df.groupby('ID').cumcount().floordiv(2)]).agg(
        index = ('index','first'),
        data1 = ('data1','first'),
        replace_me = ('replace_me',lambda x: '(=)'),
        agg_me = ('agg_me','sum'),
        Percent = ('agg_me',lambda x: x.min()/x.sum()))
            .reset_index(level=0)]).fillna(0).sort_values('ID').reset_index(drop=True)

Output:

   index  data1 replace_me  agg_me  ID  Percent
0      1    100        ( )      25   1   0.0000
1      2    200        (=)      80   2   0.4375
2      4    300        ( )      55   3   0.0000
3      5    400        (=)      20   4   0.5000
4      7    400        (=)      20   4   0.5000

index	data1	replace_me	agg_me	ID
1	100	( )	25	1
2	200	(-)	35	2
3	200	( )	45	2
4	300	( )	55	3
5	400	( )	10	4
6	400	( )	10	4
7	400	(-)	10	4
8	400	(-)	10	4

index	data1	replace_me	agg_me	ID
1	100	( )	25	1
2	200	(-)	35	2
3	200	( )	45	2
4	300	( )	55	3
5	400	( )	10	4
6	400	( )	10	4
7	400	(-)	10	4
8	400	(-)	10	4

index	data1	replace_me	agg_me	ID
1	100	( )	25	1
2	200	(-)	35	2
3	200	( )	45	2
4	300	( )	55	3
5	400	( )	10	4
6	400	( )	10	4
7	400	(-)	10	4
8	400	(-)	10	4