New column with the month starting from the first order for each user-CodePudding

I've got data like this:

user_id order_date
393     15/03/19
393     16/03/19
393     23/03/19
393     24/03/19
393     25/03/19
393     28/03/19
393     29/03/19
393     30/03/19
393     31/03/19
393     05/04/19   
1014    08/12/18   
1014    09/12/18   
1014    18/12/18   
1014    20/01/19   
1014    22/03/19   
1014    23/03/19   
1014    30/07/19

What I want to achieve is the following:

user_id order_date month_no
393     15/03/19   1
393     16/03/19   1
393     23/03/19   1
393     24/03/19   1
393     25/03/19   1
393     28/03/19   1
393     29/03/19   1
393     30/03/19   1
393     31/03/19   1
393     05/04/19   2
1014    08/12/18   1
1014    09/12/18   1
1014    18/12/18   1
1014    20/01/19   2
1014    22/03/19   3
1014    23/03/19   3
1014    30/07/19   4

What I was trying to do was:

df["month_no"] = ((df.order_date- df.groupby('user_id')['order_date'].transform('first')) // np.timedelta64(1, 'M'))   1

But it is not working as I intend. I think it's due to the fact that the first order data is in mid-March?

CodePudding user response：

You can use groupby.rank on the monthly period:

df['month_no'] = (pd.to_datetime(df['order_date'], dayfirst=True)
                  .dt.to_period('M')
                  .groupby(df['user_id'])
                  .rank('dense')
                  .convert_dtypes()
                  )

Output:

    user_id order_date  month_no
0       393   15/03/19         1
1       393   16/03/19         1
2       393   23/03/19         1
3       393   24/03/19         1
4       393   25/03/19         1
5       393   28/03/19         1
6       393   29/03/19         1
7       393   30/03/19         1
8       393   31/03/19         1
9       393   05/04/19         2
10     1014   08/12/18         1
11     1014   09/12/18         1
12     1014   18/12/18         1
13     1014   20/01/19         2
14     1014   22/03/19         3
15     1014   23/03/19         3
16     1014   30/07/19         4