Home > Enterprise >  Cumulative count of different strings in a column based on value of another column
Cumulative count of different strings in a column based on value of another column

Time:10-16

I've got a df that looks like this with duplicate ID's

     ID    Usage_type
0     5    Note
1     6    Note
2     7    Service
3     5    Note
4     7    Note
5     10   Service

I want an extra two columns that indicate the cumulative count of usage_type for each ID like so:

     ID    Usage_type   type_Note    type_Service
0     5    Note         1            0
1     6    Note         1            0
2     7    Service      0            1
3     5    Note         2            0
4     7    Note         1            1
5     10   Service      0            1

I've used cumulative count to get the total count of Usage_type for each ID but want to break it down further into separate counts for each string

CodePudding user response:

You could split your dataframe on the different Usage_type, i.e. Note / Service, perform a groupby().cumcount() 1, and then put them back together using concat. Lastly, do a grouped ffill() on the result to account for the such cases you explained to me in the comments:

pd.concat(
    [df,
           df.loc[df.Usage_type=='Note'].groupby('ID').cumcount() 1,
           df.loc[df.Usage_type=='Service'].groupby('ID').cumcount() 1],
           axis=1)\
          .groupby('ID').ffill().fillna(0)\
          .rename(columns ={0:'type_Note',1:'type_Service'}
                  )

Which prints back:

  Usage_type  type_Note  type_Service
0       Note        1.0           0.0
1       Note        1.0           0.0
2    Service        0.0           1.0
3       Note        2.0           0.0
4       Note        1.0           1.0
5    Service        0.0           1.0

EDIT:

If you want to add just two extra columns in the end, you could adjust the above code.

grp_cc = df.groupby('ID').cumcount() 1
df.loc[df.Usage_type=='Note','type_Note'] = grp_cc
df.loc[df.Usage_type=='Service','type_Service'] = grp_cc

This will get you:

   ID Usage_type  type_Note  type_Service
0   5       Note        1.0           NaN
1   6       Note        1.0           NaN
2   7    Service        NaN           1.0
3   5       Note        2.0           NaN
4   7       Note        2.0           NaN
5  10    Service        NaN           1.0

And then performing a grouped forward fill and filling in the null values get's you what you need:

df = df.groupby('ID').ffill().fillna(0)    

>>> df    
 
  Usage_type  type_Note  type_Service
0       Note        1.0           0.0
1       Note        1.0           0.0
2    Service        0.0           1.0
3       Note        2.0           0.0
4       Note        2.0           1.0
5    Service        0.0           1.0

CodePudding user response:

I found an answer to your problem, and solved it by creating a custom function here's the bit of code that worked for me:

def truefalse(df):
    count=0
    df_2=pd.DataFrame({'col':[0]*(len(df))})
    for i in range(len(df)):
        if df[i] == True:
            df_2['col'][i]=count 1
            count =1
    return df_2
ID=[5,6,7,5,7,10]
usg=['Note','Note','Service','Note','Note','Service']
df=pd.DataFrame({'ID':ID,'Usage_type':usg})
for unid in df['ID'].unique():
     df['type_Note'] = truefalse(((df['Usage_type'] =='Note') & ( df['ID'] ==unid)))['col'] df['type_Note']

You should repeat it for usage type = note and service seperately, hope this helps!

  • Related