Home > Enterprise >  How to count elements in a one to many relationship dataframe in python
How to count elements in a one to many relationship dataframe in python

Time:06-16

i have a dataframe which contains a client code, the number of the contract and the products from the contract.

Something like this :

client_code contract_number product
AAAA 1000 Water
AAAA 1000 Soda
AAAA 1000 Food
BACD 1001 Water
BACD 1001 Soda
DAMR 1002 Food

And I want to add a column which contains a count to see how many products are on a contract from 1 to n . Something like this:

client_code contract_number product count
AAAA 1000 Water 1
AAAA 1000 Soda 2
AAAA 1000 Food 3
BACD 1001 Water 1
BACD 1001 Soda 2
DAMR 1002 Food 1

I've tried with a for loop but it's too slow ( like an hour ).

df['count']=0
k=1
for i in range(0,len(df)-1):
    
    if(df.at[i,'contract_number']==df.at[i 1,'contract_number']):
        df.at[i,'count']=k
        k=k 1    
    else : 
        df.at[i,'count']=k
        k=1
        
df.at[i 1,'count']=k    

Any ideea how to speed up my code ?

PS : My data frame contains 500.000 lines .

Thank you !

CodePudding user response:

IIUC, you want a cumulative count by each client_code (or probably contract_number) - you can do that with the cumcount function:

df.loc[:, 'count'] = df.groupby('client_code').cumcount()   1 

Output

  client_code  contract_number product  count
0        AAAA             1000   Water      1
1        AAAA             1000    Soda      2
2        AAAA             1000    Food      3
3        BACD             1001   Water      1
4        BACD             1001    Soda      2
5        DAMR             1002    Food      1
  • Related