Let's say I have a dataset with data from the last 5 months for some people. I want to create some features related to each of those 5 months, like that:
My initial dataframe is looking like that:
Person | date |
---|---|
Person A | 25-01-2015 |
Person B | 29-01-2015 |
Person A | 13-02-2015 |
Person A | 14-05-2015 |
Person A | 19-05-2015 |
I want my dataframe to look like that from below: So, creating 5 (or more columns, maybe 12 corresponding to all of the 12 months from 2015 year) related to month and year (grouping by the month and year) and then putting in the corresponding features their count based on the month and year.
Person | 01-2015 | 02-2015 | 03-2015 | 04-2015 | 05-2015 |
---|---|---|---|---|---|
Person A | 2 | 1 | 0 | 0 | 2 |
Person B | 1 | 0 | 0 | 0 | 0 |
Could you please suggest me how to do that?
CodePudding user response:
Use pivot
:
out = df.groupby(['Person', pd.Grouper(freq='MS', key='date')], as_index=False) \
.size().pivot('Person', 'date', 'size')
out = out.reindex(columns=pd.date_range(out.columns.min(), out.columns.max(), freq='MS')) \
.fillna(0).astype(int).reset_index()
Output:
>>> out
2015-01-01 2015-02-01 2015-03-01 2015-04-01 2015-05-01
Person
Person A 1 1 0 0 2
Person B 1 0 0 0 0