I would like to count the number of daily cases within a certain month as follows:
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})
and the final outcome would be like this:
ID year_month count
0 A 2010-02 2
1 A 2010-12 1
2 B 2012-01 2
3 C 2011-01 3
4 C 2014-02 1
5 D 2010-08 2
6 D 2010-09 1
Do you have any ideas about how to produce the DataFrame like the above? I used groupby
and apply
functions but could not produce like that. Thanks in advance!
CodePudding user response:
Use Series.dt.to_period
for month periods and count by GroupBy.size
:
#convert to datetimes if necessary
#d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', d1['date'].dt.to_period('m').rename('year_month')])
.size()
.reset_index(name='count'))
print (df)
ID year_month count
0 A 2010-02 2
1 A 2010-12 1
2 B 2012-01 2
3 C 2011-01 3
4 C 2014-02 1
5 D 2010-08 2
6 D 2010-09 1
Another idea with Series.dt.strftime
:
#convert to datetimes if necessary
#d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', d1['date'].dt.strftime('%Y-%m').rename('year_month')])
.size()
.reset_index(name='count'))
print (df)
ID year_month count
0 A 2010-02 2
1 A 2010-12 1
2 B 2012-01 2
3 C 2011-01 3
4 C 2014-02 1
5 D 2010-08 2
6 D 2010-09 1
If no datetimes, but strings:
df = (d1.groupby(['ID', d1['date'].str[:7].rename('year_month')])
.size()
.reset_index(name='count'))
print (df)
ID year_month count
0 A 2010-02 2
1 A 2010-12 1
2 B 2012-01 2
3 C 2011-01 3
4 C 2014-02 1
5 D 2010-08 2
6 D 2010-09 1
CodePudding user response:
Using apply
and groupby
should work:
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})
d1["month_year"] = d1.apply(lambda row: row["date"][:7])
month_year = d1.groupby("month_year").size().reset_index(name="count")
print(month_year)
This will result:
month_year count
0 2010-02 2
1 2010-08 2
2 2010-09 1
3 2010-12 1
4 2011-01 3
5 2012-01 2
6 2014-02 1
You will probably want to change the apply
lambda to handle the date more carefully.