I have the following Pandas DF:
visit_date|house_id
---------- ---------
2017-12-27|892815605
2018-01-03|892807836
2018-01-03|892815815
2018-01-03|892812970
2018-01-03|892803143
2018-01-03|892815463
2018-01-03|892816168
2018-01-03|892814475
2018-01-03|892813594
2018-01-03|892813557
2018-01-03|892809834
2018-01-03|892809834
2018-01-03|892803143
2018-01-03|892803143
2018-01-03|892800500
2018-01-03|892806236
2018-01-03|892810789
2018-01-03|892797487
2018-01-03|892815182
2018-01-03|892814514
2018-01-03|892778046
2018-01-03|892809386
2018-01-03|892816048
2018-01-03|892816048
2018-01-03|892816078
2018-01-03|892810643
I need to know the most visited house (house_id) in each month(month).
How do I do that? I did a groupby:
df_1.groupby(by=['house_id', 'month']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True)
But it say me anything. So I try to do that for each month:
df_1[df_1['month']==1].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)
df_1[df_1['month']==2].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)
and so on...
But I think there is a clever way to do that. But I don't know. Is it possible to iterate? How do I iterate to know the most visited house in each month ({1:'January', ... 12:'December'}) Thanks a lot
CodePudding user response:
Adopted from a similar answer
df = pd.DataFrame({'visit_date': ['2017-12-27', '2018-01-03', '2018-01-03', '2018-01-03',
'2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03',
'2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03',
'2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03',
'2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03'],
'house_id': [892815605, 892807836, 892815815, 892812970, 892803143, 892815463, 892816168,
892814475, 892813594, 892813557, 892809834, 892809834, 892803143, 892803143, 892800500,
892806236, 892810789, 892797487, 892815182, 892814514, 892778046, 892809386, 892816048,
892816048, 892816078, 892810643]})
df['month'] = pd.to_datetime(df['visit_date']).dt.month
df = df.groupby(['month','house_id']).size().groupby(level=0).nlargest(1).droplevel(1).reset_index(name='count')
print(df)
Output
month house_id count
0 1 892803143 3
1 12 892815605 1
CodePudding user response:
Is there a month column in the data frame or do we need to extract the month from the date?