I've got a df that looks like this with duplicate ID's
ID Usage_type
0 5 Note
1 6 Note
2 7 Service
3 5 Note
4 7 Note
5 10 Service
I want an extra two columns that indicate the cumulative count of usage_type for each ID like so:
ID Usage_type type_Note type_Service
0 5 Note 1 0
1 6 Note 1 0
2 7 Service 0 1
3 5 Note 2 0
4 7 Note 1 1
5 10 Service 0 1
I've used cumulative count to get the total count of Usage_type for each ID but want to break it down further into separate counts for each string
CodePudding user response:
You could split your dataframe on the different Usage_type, i.e. Note / Service, perform a groupby().cumcount() 1
, and then put them back together using concat
. Lastly, do a grouped ffill()
on the result to account for the such cases you explained to me in the comments:
pd.concat(
[df,
df.loc[df.Usage_type=='Note'].groupby('ID').cumcount() 1,
df.loc[df.Usage_type=='Service'].groupby('ID').cumcount() 1],
axis=1)\
.groupby('ID').ffill().fillna(0)\
.rename(columns ={0:'type_Note',1:'type_Service'}
)
Which prints back:
Usage_type type_Note type_Service
0 Note 1.0 0.0
1 Note 1.0 0.0
2 Service 0.0 1.0
3 Note 2.0 0.0
4 Note 1.0 1.0
5 Service 0.0 1.0
EDIT:
If you want to add just two extra columns in the end, you could adjust the above code.
grp_cc = df.groupby('ID').cumcount() 1
df.loc[df.Usage_type=='Note','type_Note'] = grp_cc
df.loc[df.Usage_type=='Service','type_Service'] = grp_cc
This will get you:
ID Usage_type type_Note type_Service
0 5 Note 1.0 NaN
1 6 Note 1.0 NaN
2 7 Service NaN 1.0
3 5 Note 2.0 NaN
4 7 Note 2.0 NaN
5 10 Service NaN 1.0
And then performing a grouped forward fill and filling in the null values get's you what you need:
df = df.groupby('ID').ffill().fillna(0)
>>> df
Usage_type type_Note type_Service
0 Note 1.0 0.0
1 Note 1.0 0.0
2 Service 0.0 1.0
3 Note 2.0 0.0
4 Note 2.0 1.0
5 Service 0.0 1.0
CodePudding user response:
I found an answer to your problem, and solved it by creating a custom function here's the bit of code that worked for me:
def truefalse(df):
count=0
df_2=pd.DataFrame({'col':[0]*(len(df))})
for i in range(len(df)):
if df[i] == True:
df_2['col'][i]=count 1
count =1
return df_2
ID=[5,6,7,5,7,10]
usg=['Note','Note','Service','Note','Note','Service']
df=pd.DataFrame({'ID':ID,'Usage_type':usg})
for unid in df['ID'].unique():
df['type_Note'] = truefalse(((df['Usage_type'] =='Note') & ( df['ID'] ==unid)))['col'] df['type_Note']
You should repeat it for usage type = note and service seperately, hope this helps!