I have the following DF
Sku Availability
0 1 out of stock
1 1 in stock
2 1 in stock
3 2 out of stock
How can I use a custom aggregate function to create the following DF:
Sku Availability
0 1 in stock
2 2 out of stock
(Basically, if a SKU is in stock, the out of stock SKUs should be dropped, I have same SKUs because each refers to a different store...)
MVCE:
d = {'Sku': ['1', '1', '1', '2'], 'Availability': ['out of stock', 'in stock', 'in stock', 'out of stock']}
df = pd.DataFrame(data=d)
# df = df.groupby('Sku').apply(lambda x: ...)
CodePudding user response:
You can use sort_values
to sort lexicographically your data by Availabilility
then drop_duplicates
(keep first row by Sku
)
out = df.sort_values(['Sku', 'Availability']) \
.drop_duplicates('Sku', ignore_index=True)
print(out)
# Output:
Sku Availability
0 1 in stock
1 2 out of stock
A more consistent way is to use CategoricalDtype
:
# Explicit is better than implicit
cat = pd.CategoricalDtype(['in stock', 'out of stock'], ordered=True)
out = df.astype({'Availability': cat}).sort_values(['Sku', 'Availability']) \
.drop_duplicates('Sku', ignore_index=True)