Home > Software design >  Collapse dataframe based on minimum integer value, but also filling null string values of other colu
Collapse dataframe based on minimum integer value, but also filling null string values of other colu

Time:11-30

ID Check 1 Check 2 Value
1 Y 1
1 Y Y 2
2 Y 5
2 8

I have a dataframe similar to the above table. I need to find a way to collapse the table so that I have only one row for each ID, and the Value column contains the minimum value for that ID.

I'm able to do this using:

df = df.fillna('')
df.groupby(['ID'],as_index=False).min()

However I also need to preserve any of the 'Y' values in my two Check columns, so that a Y will always take priority and fill in any null values on a row with the same ID. So based on the table above, the result I'm looking for would be like this:

ID Check 1 Check 2 Value
1 Y Y 1
2 Y 5

What I'm seeing happen so far is that I'll be left with blanks in my Check 1 column, instead of a Y value. Any ideas on how to do this?

CodePudding user response:

Group the dataframe by ID, then pass min as aggregate for Value column, and first as aggregate for rest of the columns.

>>> df.groupby('ID').agg({'Check 1': 'first', 'Check 2':'first', 'Value': 'min'})
 
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2     None       Y      5

CodePudding user response:

If you sort_values on the "Value" column before groupby, you can simply keep the first valid row for all columns:

>>> df.sort_values("Value").groupby("ID").first().fillna('')
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2                Y      5
  • Related