Home > Enterprise >  Count Rank in pandas
Count Rank in pandas

Time:04-11

I have a data frame like this. Invoices are transformed into a table where each row has a single product for each customer. This table consists of millions of rows -

customer invoice products
a INVC001 xx
a INVC001 yy
a INVC002 zz
a INVC003 aa
b INVC004 xx
b INVC004 aa
b INVC004 bb
b INVC005 cc

Now I want to create a column where I want to rank each customers invoices. The table I want

customer invoice products rank
a INVC001 xx 1
a INVC001 yy 1
a INVC002 zz 2
a INVC003 aa 3
b INVC004 xx 1
b INVC004 aa 1
b INVC004 bb 1
b INVC005 cc 2

I tried to use cumcount, with grouping customer and invoices with this code df.groupby(['customer','invoice'])['invoice'].cumcount() 1, also tried nggroup() but couldnot generate my expected output.

Is there any numpy/pandas way to create a table like this? Or, is there any optimized way to do this. Will appreciate the help.

CodePudding user response:

Use factorize in GroupBy.transform:

df['rank'] = df.groupby('customer')['invoice'].transform(lambda x: pd.factorize(x)[0])   1

For me rank failed:

df['rank'] = df.groupby('customer')['invoice'].rank(method='dense')

DataError: No numeric types to aggregate

Solution should be like mentioned @pavel:

df['rank'] = df.groupby('customer')['invoice'].rank(method='dense', numeric_only=False)
  • Related