I have a CSV where I can see user reactions in pages. I need to count how many posts are made each month by a page. (I have users in this dataset, but it's irrelevant now.)
page,postingdate
page1,2021-01-10 08:02:05
page1,2021-01-10 08:02:05// 01 month 1 post 2 reactions
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10
page1,2021-01-10 09:10:10 //01 month 2 post 3 reactions
page1,2021-02-20 08:02:05
page1,2021-02-20 08:02:05
page1,2021-02-20 08:02:05 //02 month 1 post
page1,2021-03-21 09:02:05
page1,2021-03-22 10:02:05
page1,2021-03-23 11:02:05
page1,2021-03-23 11:02:05 //03 month 3 posts 4 reactions
page2,2021-01-15 08:02:05
page2,2021-01-15 08:02:05
My expected output:
page1 2021-01-31 00:00:00 00:00 2
2021-02-30 00:00:00 00:00 1
2021-03-31 00:00:00 00:00 3
page2 2021-01-31 00:00:00 00:00 2
So I need to group by each page and count the different posting times and break down each of them monthly.
I have an example pandas script that group by all of the pages, and counts all posts:
df.groupby(by='page', as_index=False).agg({'postingdate': pd.Series.nunique})
My problem with this script, that it doesn't break the counting to months.
CodePudding user response:
Add groupoing by months by Grouper
:
df = (df.groupby(['page', pd.Grouper(freq='M', key='postingdate')], as_index=False)
.agg({'postingdate': pd.Series.nunique}))
print (df)
page postingdate
0 page1 2
1 page1 1
2 page1 3
3 page2 1
If need also date
s:
df = (df.groupby(['page', pd.Grouper(freq='M', key='postingdate')])['postingdate']
.nunique()
.reset_index(name='val'))
print (df)
page postingdate val
0 page1 2021-01-31 2
1 page1 2021-02-28 1
2 page1 2021-03-31 3
3 page2 2021-01-31 1