I have a crime dataset where every row is one recorded offence that is to be used in an ARIMA time series model.
Date
0 2015-09-05
1 2015-09-05
2 2015-07-08
3 2017-09-05
4 2018-09-05
4 2018-09-05
I would like to group by data, so that offences that occurred on the same day are aggregated.
Date Count
0 2015-09-05 2
1 2015-07-08 1
2 2017-09-05 1
3 2018-09-05 2
I'm struggling because I'm trying to both group by weeks per year, and because I'm not aggregating the contents of a column, I'm trying to count how many rows are grouped into it.
Thank you.
CodePudding user response:
If your dataset is a dataframe, you can use:
df.assign(Count=1).groupby('Date')['Count'].count()
If it's a series:
series.to_frame().assign(Count=1).groupby('Date')['Count'].count()
For example:
df = pd.DataFrame({'Date':['2015-09-05',
'2015-09-05',
'2015-07-08',
'2017-09-05',
'2018-09-05',
'2018-09-05']})
df.assign(Count=1).groupby('Date')['Count'].count().reset_index()
Returns:
Date Count
0 2015-07-08 1
1 2015-09-05 2
2 2017-09-05 1
3 2018-09-05 2
CodePudding user response:
One way to do it is to use Python rather than pandas for the heavy lifting:
import pandas as pd
import datetime
df = pd.DataFrame([datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in ['2015-09-05', '2015-09-05', '2015-07-08', '2017-09-05', '2018-09-05', '2018-09-05']], columns=['Date'])
from collections import Counter
c = Counter(list(df['Date']))
df2 = pd.DataFrame(zip(list(c.keys()), list(c.values())), columns=['Date', 'Count'])
print(df2)
Output:
Date Count
0 2015-09-05 2
1 2015-07-08 1
2 2017-09-05 1
3 2018-09-05 2