Pandas - Group data by week and add column for count of rows in group-CodePudding

I have a crime dataset where every row is one recorded offence that is to be used in an ARIMA time series model.

    Date

0   2015-09-05
1   2015-09-05
2   2015-07-08
3   2017-09-05
4   2018-09-05
4   2018-09-05

I would like to group by data, so that offences that occurred on the same day are aggregated.

    Date          Count

0   2015-09-05    2
1   2015-07-08    1
2   2017-09-05    1
3   2018-09-05    2

I'm struggling because I'm trying to both group by weeks per year, and because I'm not aggregating the contents of a column, I'm trying to count how many rows are grouped into it.

Thank you.

CodePudding user response：

If your dataset is a dataframe, you can use:

df.assign(Count=1).groupby('Date')['Count'].count()

If it's a series:

series.to_frame().assign(Count=1).groupby('Date')['Count'].count()

For example:

df = pd.DataFrame({'Date':['2015-09-05',
                           '2015-09-05',
                           '2015-07-08',
                           '2017-09-05',
                           '2018-09-05',
                           '2018-09-05']})
df.assign(Count=1).groupby('Date')['Count'].count().reset_index()

Returns:

         Date  Count
0  2015-07-08      1
1  2015-09-05      2
2  2017-09-05      1
3  2018-09-05      2

CodePudding user response：

One way to do it is to use Python rather than pandas for the heavy lifting:

import pandas as pd
import datetime
df = pd.DataFrame([datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in ['2015-09-05', '2015-09-05', '2015-07-08', '2017-09-05', '2018-09-05', '2018-09-05']], columns=['Date'])

from collections import Counter
c = Counter(list(df['Date']))
df2 = pd.DataFrame(zip(list(c.keys()), list(c.values())), columns=['Date', 'Count'])
print(df2)

Output:

         Date  Count
0  2015-09-05      2
1  2015-07-08      1
2  2017-09-05      1
3  2018-09-05      2