Home > database >  SQL Server or Pandas Rank / Numbering a Window Function by Partition
SQL Server or Pandas Rank / Numbering a Window Function by Partition

Time:10-10

I have a table like below where I want to rank the PartNumber groups in a Category while keeping the LineItem order.

This is the starting table -->

Category Line Item Part Number
Apple 1 Granny Smith
Apple 2 Granny Smith
Apple 3 Fuji
Apple 4 Fuji
Banana 1 Chiquita
Banana 2 Chiquita
Banana 3 Dole
Banana 4 Dole
Banana 5 Del Monto
Banana 6 Del Monto

And this is my target table -->

Category Line Item Part Number Rank Order
Apple 1 Granny Smith 1
Apple 2 Granny Smith 1
Apple 3 Fuji 2
Apple 4 Fuji 2
Banana 1 Chiquita 1
Banana 2 Chiquita 1
Banana 3 Dole 2
Banana 4 Dole 2
Banana 5 Del Monto 3
Banana 6 Del Monto 3

Keeping the Line Item order is required in this case. I've tried to use SQL partitions and ranks, but they are ordering by the Part Number and not the Line Item.

SELECT 
  Category,
  LineItem,
  PartNumber,
  DENSE_RANK() OVER (PARTITION BY Category ORDER BY PartNumber ASC)
FROM TABLE

If this is not possible, is there a similar option in Pandas that would yield a similar result?

CodePudding user response:

In pandas use groupby transform factorize as recommended by @Scott Boston's answer to Counting occurrence of values after using groupby on multiple pandas columns:

df['Rank Order'] = (
    df.groupby('Category')['Part Number']
        .transform(lambda s: s.factorize()[0]   1)
)

*factorize will convert each group of Part Numbers into an enumerated type ensuring Ranks appear in numerical ascending order by appearance. This as opposed to groupby rank which will use an ordering method.

df:

  Category  Line Item   Part Number  Rank Order
0    Apple          1  Granny Smith           1
1    Apple          2  Granny Smith           1
2    Apple          3          Fuji           2
3    Apple          4          Fuji           2
4   Banana          1      Chiquita           1
5   Banana          2      Chiquita           1
6   Banana          3          Dole           2
7   Banana          4          Dole           2
8   Banana          5     Del Monto           3
9   Banana          6     Del Monto           3

Setup and imports:

import pandas as pd

df = pd.DataFrame({
    'Category': ['Apple', 'Apple', 'Apple', 'Apple', 'Banana', 'Banana',
                 'Banana', 'Banana', 'Banana', 'Banana'],
    'Line Item': [1, 2, 3, 4, 1, 2, 3, 4, 5, 6],
    'Part Number': ['Granny Smith', 'Granny Smith', 'Fuji', 'Fuji', 'Chiquita',
                    'Chiquita', 'Dole', 'Dole', 'Del Monto', 'Del Monto']
})

CodePudding user response:

It sounds like what you need is to order the DENSE_RANK by the minimum LineItem per Category, PartNumber

SELECT 
  Category,
  LineItem,
  PartNumber,
  DENSE_RANK() OVER (PARTITION BY Category ORDER BY MinLineItem)
FROM (
    SELECT *,
      MinLineItem = MIN(LineItem) OVER (PARTITION BY Category, PartNumber)
    FROM [TABLE]
) t

db<>fiddle

  • Related