Home > Enterprise >  How to group-by and count dates each month per pages with Pandas?
How to group-by and count dates each month per pages with Pandas?

Time:09-28

I have a CSV where I can see user reactions in pages. I need to count how many posts are made each month by a page. (I have users in this dataset, but it's irrelevant now.)

    page,postingdate
page1,2021-01-10 08:02:05
page1,2021-01-10 08:02:05// 01 month 1 post 2 reactions
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10 //01 month 2 post 3 reactions
page1,2021-02-20 08:02:05 
page1,2021-02-20 08:02:05
page1,2021-02-20 08:02:05 //02 month 1 post
page1,2021-03-21 09:02:05
page1,2021-03-22 10:02:05
page1,2021-03-23 11:02:05
page1,2021-03-23 11:02:05 //03 month 3 posts 4 reactions
page2,2021-01-15 08:02:05
page2,2021-01-15 08:02:05

My expected output:
page1 2021-01-31 00:00:00 00:00       2 
      2021-02-30 00:00:00 00:00       1
      2021-03-31 00:00:00 00:00       3
page2 2021-01-31 00:00:00 00:00       2

So I need to group by each page and count the different posting times and break down each of them monthly.

I have an example pandas script that group by all of the pages, and counts all posts:

df.groupby(by='page', as_index=False).agg({'postingdate': pd.Series.nunique})

My problem with this script, that it doesn't break the counting to months.

CodePudding user response:

Add groupoing by months by Grouper:

df = (df.groupby(['page', pd.Grouper(freq='M', key='postingdate')], as_index=False)
        .agg({'postingdate': pd.Series.nunique}))
print (df)
    page  postingdate
0  page1            2
1  page1            1
2  page1            3
3  page2            1

If need also dates:

df = (df.groupby(['page', pd.Grouper(freq='M', key='postingdate')])['postingdate']
        .nunique()
        .reset_index(name='val'))
print (df)
    page postingdate  val
0  page1  2021-01-31    2
1  page1  2021-02-28    1
2  page1  2021-03-31    3
3  page2  2021-01-31    1
  • Related