Count Rank in pandas-CodePudding

I have a data frame like this. Invoices are transformed into a table where each row has a single product for each customer. This table consists of millions of rows -

customer	invoice	products
a	INVC001	xx
a	INVC001	yy
a	INVC002	zz
a	INVC003	aa
b	INVC004	xx
b	INVC004	aa
b	INVC004	bb
b	INVC005	cc

Now I want to create a column where I want to rank each customers invoices. The table I want

customer	invoice	products	rank
a	INVC001	xx	1
a	INVC001	yy	1
a	INVC002	zz	2
a	INVC003	aa	3
b	INVC004	xx	1
b	INVC004	aa	1
b	INVC004	bb	1
b	INVC005	cc	2

I tried to use cumcount, with grouping customer and invoices with this code df.groupby(['customer','invoice'])['invoice'].cumcount() 1, also tried nggroup() but couldnot generate my expected output.

Is there any numpy/pandas way to create a table like this? Or, is there any optimized way to do this. Will appreciate the help.

CodePudding user response：

Use factorize in GroupBy.transform:

df['rank'] = df.groupby('customer')['invoice'].transform(lambda x: pd.factorize(x)[0])   1

For me rank failed:

df['rank'] = df.groupby('customer')['invoice'].rank(method='dense')

DataError: No numeric types to aggregate

Solution should be like mentioned @pavel:

df['rank'] = df.groupby('customer')['invoice'].rank(method='dense', numeric_only=False)