How to count elements in a one to many relationship dataframe in python-CodePudding

i have a dataframe which contains a client code, the number of the contract and the products from the contract.

Something like this :

client_code	contract_number	product
AAAA	1000	Water
AAAA	1000	Soda
AAAA	1000	Food
BACD	1001	Water
BACD	1001	Soda
DAMR	1002	Food

And I want to add a column which contains a count to see how many products are on a contract from 1 to n . Something like this:

client_code	contract_number	product	count
AAAA	1000	Water	1
AAAA	1000	Soda	2
AAAA	1000	Food	3
BACD	1001	Water	1
BACD	1001	Soda	2
DAMR	1002	Food	1

I've tried with a for loop but it's too slow ( like an hour ).

df['count']=0
k=1
for i in range(0,len(df)-1):
    
    if(df.at[i,'contract_number']==df.at[i 1,'contract_number']):
        df.at[i,'count']=k
        k=k 1    
    else : 
        df.at[i,'count']=k
        k=1
        
df.at[i 1,'count']=k

Any ideea how to speed up my code ?

PS : My data frame contains 500.000 lines .

Thank you !

CodePudding user response：

IIUC, you want a cumulative count by each client_code (or probably contract_number) - you can do that with the cumcount function:

df.loc[:, 'count'] = df.groupby('client_code').cumcount()   1

Output

  client_code  contract_number product  count
0        AAAA             1000   Water      1
1        AAAA             1000    Soda      2
2        AAAA             1000    Food      3
3        BACD             1001   Water      1
4        BACD             1001    Soda      2
5        DAMR             1002    Food      1