I have a table like below where I want to rank the PartNumber groups in a Category while keeping the LineItem order.
This is the starting table -->
Category | Line Item | Part Number |
---|---|---|
Apple | 1 | Granny Smith |
Apple | 2 | Granny Smith |
Apple | 3 | Fuji |
Apple | 4 | Fuji |
Banana | 1 | Chiquita |
Banana | 2 | Chiquita |
Banana | 3 | Dole |
Banana | 4 | Dole |
Banana | 5 | Del Monto |
Banana | 6 | Del Monto |
And this is my target table -->
Category | Line Item | Part Number | Rank Order |
---|---|---|---|
Apple | 1 | Granny Smith | 1 |
Apple | 2 | Granny Smith | 1 |
Apple | 3 | Fuji | 2 |
Apple | 4 | Fuji | 2 |
Banana | 1 | Chiquita | 1 |
Banana | 2 | Chiquita | 1 |
Banana | 3 | Dole | 2 |
Banana | 4 | Dole | 2 |
Banana | 5 | Del Monto | 3 |
Banana | 6 | Del Monto | 3 |
Keeping the Line Item order is required in this case. I've tried to use SQL partitions and ranks, but they are ordering by the Part Number and not the Line Item.
SELECT
Category,
LineItem,
PartNumber,
DENSE_RANK() OVER (PARTITION BY Category ORDER BY PartNumber ASC)
FROM TABLE
If this is not possible, is there a similar option in Pandas that would yield a similar result?
CodePudding user response:
In pandas
use groupby transform
factorize
as recommended by @Scott Boston's answer to Counting occurrence of values after using groupby on multiple pandas columns:
df['Rank Order'] = (
df.groupby('Category')['Part Number']
.transform(lambda s: s.factorize()[0] 1)
)
*factorize
will convert each group of Part Numbers into an enumerated type ensuring Ranks appear in numerical ascending order by appearance. This as opposed to groupby rank
which will use an ordering method.
df
:
Category Line Item Part Number Rank Order
0 Apple 1 Granny Smith 1
1 Apple 2 Granny Smith 1
2 Apple 3 Fuji 2
3 Apple 4 Fuji 2
4 Banana 1 Chiquita 1
5 Banana 2 Chiquita 1
6 Banana 3 Dole 2
7 Banana 4 Dole 2
8 Banana 5 Del Monto 3
9 Banana 6 Del Monto 3
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'Category': ['Apple', 'Apple', 'Apple', 'Apple', 'Banana', 'Banana',
'Banana', 'Banana', 'Banana', 'Banana'],
'Line Item': [1, 2, 3, 4, 1, 2, 3, 4, 5, 6],
'Part Number': ['Granny Smith', 'Granny Smith', 'Fuji', 'Fuji', 'Chiquita',
'Chiquita', 'Dole', 'Dole', 'Del Monto', 'Del Monto']
})
CodePudding user response:
It sounds like what you need is to order the DENSE_RANK
by the minimum LineItem
per Category, PartNumber
SELECT
Category,
LineItem,
PartNumber,
DENSE_RANK() OVER (PARTITION BY Category ORDER BY MinLineItem)
FROM (
SELECT *,
MinLineItem = MIN(LineItem) OVER (PARTITION BY Category, PartNumber)
FROM [TABLE]
) t