Home > database >  How iterate over a df to know the most frequent item in each month
How iterate over a df to know the most frequent item in each month

Time:07-18

I have the following Pandas DF:

visit_date|house_id
---------- ---------
2017-12-27|892815605
2018-01-03|892807836
2018-01-03|892815815
2018-01-03|892812970
2018-01-03|892803143
2018-01-03|892815463
2018-01-03|892816168
2018-01-03|892814475
2018-01-03|892813594
2018-01-03|892813557
2018-01-03|892809834
2018-01-03|892809834
2018-01-03|892803143
2018-01-03|892803143
2018-01-03|892800500
2018-01-03|892806236
2018-01-03|892810789
2018-01-03|892797487
2018-01-03|892815182
2018-01-03|892814514
2018-01-03|892778046
2018-01-03|892809386
2018-01-03|892816048
2018-01-03|892816048
2018-01-03|892816078
2018-01-03|892810643

I need to know the most visited house (house_id) in each month(month).

How do I do that? I did a groupby:

df_1.groupby(by=['house_id', 'month']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True)

But it say me anything. So I try to do that for each month:

df_1[df_1['month']==1].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)

df_1[df_1['month']==2].groupby(by=['house_id']).count().reset_index().sort_values(by=['month'], ascending=True, ignore_index=True).tail(1)

and so on...

But I think there is a clever way to do that. But I don't know. Is it possible to iterate? How do I iterate to know the most visited house in each month ({1:'January', ... 12:'December'}) Thanks a lot

CodePudding user response:

Adopted from a similar answer

df = pd.DataFrame({'visit_date': ['2017-12-27',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03',
  '2018-01-03',  '2018-01-03',  '2018-01-03',  '2018-01-03'],
 'house_id': [892815605,  892807836,  892815815,  892812970,  892803143,  892815463,  892816168,
  892814475,  892813594,  892813557,  892809834,  892809834,  892803143,  892803143,  892800500,
  892806236,  892810789,  892797487,  892815182,  892814514,  892778046,  892809386,  892816048,
  892816048,  892816078,  892810643]})

df['month'] = pd.to_datetime(df['visit_date']).dt.month
df = df.groupby(['month','house_id']).size().groupby(level=0).nlargest(1).droplevel(1).reset_index(name='count')
print(df)

Output

   month   house_id  count
0      1  892803143      3
1     12  892815605      1

CodePudding user response:

Is there a month column in the data frame or do we need to extract the month from the date?

  • Related