Pandas - How to drop rows based on a unique column value where another column value is a minimum and-CodePudding

I have a pandas dataframe with something like the following:

index	order_id	cost
123a	123	5
123b	123	None
123c	123	3
124a	124	None
124b	124	None

For each unique value of order_id, I'd like to drop any row that isn't the lowest cost. For any order_id that only contains nulls for the cost, any row for an order_id can be retained.

I've been struggling with this for a while now.

ol3 = ol3.loc[ol3.groupby('Order_ID').cost.idxmin()]

This code doesn't play nice with the order_id's that have only nulls. So, I tried to figure out how to drop the null's I don't want with

ol4 = ol3.loc[ol3['cost'].isna()].drop_duplicates(subset=['Order_ID', 'cost'], keep='first')

This gives me the list of null order_id's I want to retain. Not sure where to go from here. I'm pretty sure I'm looking at this the wrong way. Any help would be appreciated!

CodePudding user response：

You can use transform to get the indexes with min cost per order_id. We additionally need isna check for the special order_ids that have only NaNs:

order_mins = df.groupby('order_id').cost.transform('min')
df[(df.cost == order_mins) | (order_mins.isna())]

CodePudding user response：

You can (temporarily) fill the NA/None with np.inf before getting the idxmin:

ol3.loc[ol3['cost'].fillna(np.inf).groupby(ol3['order_id']).idxmin()]

You will have exactly one row per order_id

output:

  index  order_id  cost
2  123c       123   3.0
3  124a       124   NaN