I have a csv file that is user behavior data in a web page. here is the sample data:
_time,dataCenter,customer,user,SID,ACT,
2021-11-25T13:45:42.139 0000,dc1,customer1,user1,sid1,open_page,
2021-11-25T13:45:50.139 0000,dc1,customer1,user1,sid1,create_form,
2021-11-25T13:46:51.139 0000,dc1,customer1,user1,sid1,save_form,
2021-11-25T13:50:50.139 0000,dc1,customer2,user2,sid2,open_page,
2021-11-25T13:51:20.139 0000,dc1,customer2,user2,sid2,open_form_detail,
2021-11-25T13:53:50.139 0000,dc1,customer2,user2,sid2,back_to_form_list,
2021-11-25T23:59:50.139 0000,dc3,customer3,user3,sid3,open_page,
2021-11-26T00:02:50.139 0000,dc3,customer3,user3,sid3,show_more,
......
......
......
I want to do below data transformation:
- group ACT by dataCenter,customer,user and SID
- extract date from _time column and assign to groupby result.
Here is the expected Result:
date,dataCenter,customer,user,SID,ACT,
2021-11-25,dc1,customer1,user1,sid1,"open_page,create_form,save_form",
2021-11-25,dc1,customer2,user2,sid2,"open_page,open_form_detail,back_to_form_list"
2021-11-25,dc3,customer3,user3,sid3,"open_page,show_more"
......
......
......
what i have tried:
df= df.groupby(['dataCenter','customer','user','sid'])['ACT'].apply(','.join)
but i am not sure how to add date
as a column in groupby
result.
Can you please help advice?
Thanks Cherie
CodePudding user response:
IIUC:
df = df.groupby(['dataCenter', 'customer', 'user', 'SID']).agg(date = ('_time', 'first'),
ACT= ('ACT', ','.join)).reset_index()
df['date'] = pd.to_datetime(df['date']).dt.date
OUTPUT
dataCenter customer user SID date ACT
0 dc1 customer1 user1 sid1 2021-11-25 open_page,create_form,save_form
1 dc1 customer2 user2 sid2 2021-11-25 open_page,open_form_detail,back_to_form_list
2 dc3 customer3 user3 sid3 2021-11-25 open_page,show_more