Home > OS >  How to create features based on month-year and count in Python/Pandas?
How to create features based on month-year and count in Python/Pandas?

Time:11-02

Let's say I have a dataset with data from the last 5 months for some people. I want to create some features related to each of those 5 months, like that:

My initial dataframe is looking like that:

Person date
Person A 25-01-2015
Person B 29-01-2015
Person A 13-02-2015
Person A 14-05-2015
Person A 19-05-2015

I want my dataframe to look like that from below: So, creating 5 (or more columns, maybe 12 corresponding to all of the 12 months from 2015 year) related to month and year (grouping by the month and year) and then putting in the corresponding features their count based on the month and year.

Person 01-2015 02-2015 03-2015 04-2015 05-2015
Person A 2 1 0 0 2
Person B 1 0 0 0 0

Could you please suggest me how to do that?

CodePudding user response:

Use pivot:

out = df.groupby(['Person', pd.Grouper(freq='MS', key='date')], as_index=False) \
       .size().pivot('Person', 'date', 'size')

out = out.reindex(columns=pd.date_range(out.columns.min(), out.columns.max(), freq='MS')) \
         .fillna(0).astype(int).reset_index()

Output:

>>> out
          2015-01-01  2015-02-01  2015-03-01  2015-04-01  2015-05-01
Person                                                              
Person A           1           1           0           0           2
Person B           1           0           0           0           0
  • Related