ID | Check 1 | Check 2 | Value |
---|---|---|---|
1 | Y | 1 | |
1 | Y | Y | 2 |
2 | Y | 5 | |
2 | 8 |
I have a dataframe similar to the above table. I need to find a way to collapse the table so that I have only one row for each ID, and the Value column contains the minimum value for that ID.
I'm able to do this using:
df = df.fillna('')
df.groupby(['ID'],as_index=False).min()
However I also need to preserve any of the 'Y' values in my two Check columns, so that a Y will always take priority and fill in any null values on a row with the same ID. So based on the table above, the result I'm looking for would be like this:
ID | Check 1 | Check 2 | Value |
---|---|---|---|
1 | Y | Y | 1 |
2 | Y | 5 |
What I'm seeing happen so far is that I'll be left with blanks in my Check 1 column, instead of a Y value. Any ideas on how to do this?
CodePudding user response:
Group the dataframe by ID
, then pass min
as aggregate for Value
column, and first
as aggregate for rest of the columns.
>>> df.groupby('ID').agg({'Check 1': 'first', 'Check 2':'first', 'Value': 'min'})
Check 1 Check 2 Value
ID
1 Y Y 1
2 None Y 5
CodePudding user response:
If you sort_values
on the "Value" column before groupby
, you can simply keep the first
valid row for all columns:
>>> df.sort_values("Value").groupby("ID").first().fillna('')
Check 1 Check 2 Value
ID
1 Y Y 1
2 Y 5