How to create features based on month-year and count in Python/Pandas?-CodePudding

Let's say I have a dataset with data from the last 5 months for some people. I want to create some features related to each of those 5 months, like that:

My initial dataframe is looking like that:

Person	date
Person A	25-01-2015
Person B	29-01-2015
Person A	13-02-2015
Person A	14-05-2015
Person A	19-05-2015

I want my dataframe to look like that from below: So, creating 5 (or more columns, maybe 12 corresponding to all of the 12 months from 2015 year) related to month and year (grouping by the month and year) and then putting in the corresponding features their count based on the month and year.

Person	01-2015	02-2015	03-2015	04-2015	05-2015
Person A	2	1	0	0	2
Person B	1	0	0	0	0

Could you please suggest me how to do that?

CodePudding user response：

Use pivot:

out = df.groupby(['Person', pd.Grouper(freq='MS', key='date')], as_index=False) \
       .size().pivot('Person', 'date', 'size')

out = out.reindex(columns=pd.date_range(out.columns.min(), out.columns.max(), freq='MS')) \
         .fillna(0).astype(int).reset_index()

Output:

>>> out
          2015-01-01  2015-02-01  2015-03-01  2015-04-01  2015-05-01
Person                                                              
Person A           1           1           0           0           2
Person B           1           0           0           0           0