Home > database >  pandas: how to add date column into groupby result
pandas: how to add date column into groupby result

Time:12-30

I have a csv file that is user behavior data in a web page. here is the sample data:

_time,dataCenter,customer,user,SID,ACT,
2021-11-25T13:45:42.139 0000,dc1,customer1,user1,sid1,open_page,
2021-11-25T13:45:50.139 0000,dc1,customer1,user1,sid1,create_form,
2021-11-25T13:46:51.139 0000,dc1,customer1,user1,sid1,save_form,
2021-11-25T13:50:50.139 0000,dc1,customer2,user2,sid2,open_page,
2021-11-25T13:51:20.139 0000,dc1,customer2,user2,sid2,open_form_detail,
2021-11-25T13:53:50.139 0000,dc1,customer2,user2,sid2,back_to_form_list,
2021-11-25T23:59:50.139 0000,dc3,customer3,user3,sid3,open_page,
2021-11-26T00:02:50.139 0000,dc3,customer3,user3,sid3,show_more,
......
......
......

I want to do below data transformation:

  1. group ACT by dataCenter,customer,user and SID
  2. extract date from _time column and assign to groupby result.

Here is the expected Result:

date,dataCenter,customer,user,SID,ACT,
2021-11-25,dc1,customer1,user1,sid1,"open_page,create_form,save_form",
2021-11-25,dc1,customer2,user2,sid2,"open_page,open_form_detail,back_to_form_list"
2021-11-25,dc3,customer3,user3,sid3,"open_page,show_more"
......
......
......

what i have tried:

df= df.groupby(['dataCenter','customer','user','sid'])['ACT'].apply(','.join)

but i am not sure how to add date as a column in groupby result.

Can you please help advice?

Thanks Cherie

CodePudding user response:

IIUC:

df = df.groupby(['dataCenter', 'customer', 'user', 'SID']).agg(date = ('_time', 'first'),
                                                               ACT= ('ACT', ','.join)).reset_index()
df['date'] = pd.to_datetime(df['date']).dt.date

OUTPUT

  dataCenter   customer   user   SID       date                                           ACT
0        dc1  customer1  user1  sid1  2021-11-25               open_page,create_form,save_form
1        dc1  customer2  user2  sid2  2021-11-25  open_page,open_form_detail,back_to_form_list
2        dc3  customer3  user3  sid3  2021-11-25                           open_page,show_more
  • Related