Home > Mobile >  Pandas merge multiple rows based on two columns
Pandas merge multiple rows based on two columns

Time:05-24

I have a dataframe shown below

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
2   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0002    2004-08-02      False   24
6   0003    2001-05-25      False   25
7   0003    2001-05-25      False   25

Which I want to aggregate based on both pid and date, where if multiple rows have the same pid and data then age stays the same and label = True if any or all of these rows have True label:

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0003    2001-05-25      False   25

I have tried groupby but all I could do is with one column - pid and no success with two columns.

CodePudding user response:

In your case sort_values drop_duplicates

out = df.sort_values('label').drop_duplicates(['pid','date','age'],keep='last').sort_index()
Out[240]: 
   pid        date  label  age
0    1  2001-09-24  False   34
2    1  2006-03-02   True   39
3    2  2003-02-07   True   23
4    2  2004-08-02   True   24
7    3  2001-05-25  False   25
  • Related