Sample Data:
user_id content_id date
0 user_44289 cont_3375_16_10 2020-03-06
1 user_44289 cont_1195_1_8 2019-04-18
2 user_44289 cont_3470_2_15 2021-09-18
3 user_44289 cont_310_25_9 2020-09-08
4 user_44289 cont_4350_1_3 2021-06-25
5 user_40584 cont_1399_27_6 2018-11-14
6 user_40584 cont_1808_2_4 2021-05-07
7 user_40584 cont_2615_7_24 2021-10-14
Using below pandas query I am grouping and sorting which is returning all_users_list which is of type pandas.core.series.Series
all_users_list = final_data.sort_values(by=['user_id','date','content_id'], ascending=False).groupby(['user_id','date','content_id'], sort=False)['user_id','content_id','date'].apply(list)
Output:
user_id date content_id
user_99974 2021-10-09 cont_4104_7_52 [user_id, content_id, date]
2021-10-04 cont_2253_6_4 [user_id, content_id, date]
2021-08-30 cont_2311_4_4 [user_id, content_id, date]
2021-07-22 cont_676_5_31 [user_id, content_id, date]
2021-05-28 cont_2456_6_1 [user_id, content_id, date]
...
user_10013 2018-12-04 cont_2597_6_8 [user_id, content_id, date]
2018-09-11 cont_2233_3_8 [user_id, content_id, date]
2018-08-13 cont_300_1_1 [user_id, content_id, date]
2018-04-10 cont_2244_16_1 [user_id, content_id, date]
2018-02-03 cont_3189_6_12 [user_id, content_id, date]
But I need to access 3 columns data of user_id, content_id and date from this all_users_list.
result = all_users_list.values.tolist()
result[0:10]
It is always returning below data, but I need to access actual data displayed above with grouped "user_id", "date" and "content_id"
[['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date']]
Please help on this. Thanks
Update:
def getContent(user):
indices = np.where(result == 'user_10013')
return result[indices][1] ## this should return the list of content_id for the retrieved user_id 'user_10013'
But printing result is always displaying ['user_id', 'content_id', 'date']
CodePudding user response:
Do you want something like:
out = df.sort_values('date', ascending=False).groupby('user_id').agg(list)
print(out)
# Output
content_id date
user_id
user_40584 [cont_2615_7_24, cont_1808_2_4, cont_1399_27_6] [2021-10-14, 2021-05-07, 2018-11-14]
user_44289 [cont_3470_2_15, cont_4350_1_3, cont_310_25_9,... [2021-09-18, 2021-06-25, 2020-09-08, 2020-03-0...