Home > Blockchain >  GroupBy a custom lambda function (using strings) in pandas
GroupBy a custom lambda function (using strings) in pandas

Time:03-27

I have the following DF

  Sku  Availability
0   1  out of stock
1   1      in stock
2   1      in stock
3   2  out of stock

How can I use a custom aggregate function to create the following DF:

  Sku  Availability
0   1      in stock
2   2  out of stock

(Basically, if a SKU is in stock, the out of stock SKUs should be dropped, I have same SKUs because each refers to a different store...)

MVCE:

d = {'Sku': ['1', '1', '1', '2'], 'Availability': ['out of stock', 'in stock', 'in stock', 'out of stock']}
df = pd.DataFrame(data=d)
# df = df.groupby('Sku').apply(lambda x: ...) 

CodePudding user response:

You can use sort_values to sort lexicographically your data by Availabilility then drop_duplicates (keep first row by Sku)

out = df.sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)
print(out)

# Output:
  Sku  Availability
0   1      in stock
1   2  out of stock

A more consistent way is to use CategoricalDtype:

# Explicit is better than implicit
cat = pd.CategoricalDtype(['in stock', 'out of stock'], ordered=True)
out = df.astype({'Availability': cat}).sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)
  • Related