Home > other >  Create new columns based on customer status when it changes
Create new columns based on customer status when it changes

Time:07-24

I have a pandas dataframe ; sample shown below:

my_id  Status     my_date  
434     A        2021-12-20
434     S1       2021-12-23
434     S1       2021-12-24
434     S1       2021-12-24
434     S1       2021-12-25
434     S2       2022-01-05
434     S2       2022-01-06
434     S2       2022-01-07
434     S2       2022-01-08
434     S2       2022-01-09
434     S2       2022-01-10
434     B        2022-01-11
434     S1       2022-10-20
434     S1       2022-10-24
434     S2       2022-10-25
980     S1       2021-12-20
341     S1       2021-08-18
387     S1       2022-03-15
625     S1       2022-02-16
625     S1       2022-02-22
625     S2       2022-02-25
905     S1       2022-03-14
905     S1       2022-03-16
905     S1       2022-03-18

I want to create three columns :

  1. received_date : Looks at Status per my_id ; if the first status is "S1" ; then the received_date is equal to my_date; if the status change to any other status and changes back to "S1" then received_date again equals to my_date else NAN

  2. accept_date : Looks at Status per my_id ; if the status is "S2" ; then the accept_date is equal to my_date; if the status change to any other status and changes back to "S2" then received_date again equals to my_date else NAN

  3. end_date : If the status changes from "S1" or "S2" to any other status then end_date is equal to my_date else it equals to last my_date per my_id

Expected Output :

my_id  Status     my_date       received_date  Accept_date End_date
434     A        2021-12-20         NAN        NAN        NAN
434     S1       2021-12-23     2021-12-23     NAN        NAN
434     S1       2021-12-24         NAN        NAN        NAN
434     S1       2021-12-24         NAN        NAN        NAN
434     S1       2021-12-25         NAN        NAN        NAN
434     S2       2022-01-05         NAN      2022-01-05   NAN
434     S2       2022-01-06         NAN        NAN        NAN
434     S2       2022-01-07         NAN        NAN        NAN
434     S2       2022-01-08         NAN        NAN        NAN
434     S2       2022-01-09         NAN        NAN        NAN   
434     S2       2022-01-10         NAN        NAN        NAN
434     B        2022-01-11         NAN        NAN       2022-01-11
434     S1       2022-10-20     2021-10-20     NAN        NAN
434     S1       2022-10-24         NAN        NAN        NAN
434     S2       2022-10-25         NAN      2022-10-25 2022-10-25
980     S1       2021-12-20     2021-12-20     NAN      2021-12-20
341     S1       2021-08-18     2021-08-18     NAN      2021-08-18
387     S1       2022-03-15     2022-03-15     NAN      2022-03-15
625     S1       2022-02-16     2022-02-16     NAN        NAN
625     S1       2022-02-22         NAN        NAN        NAN
625     S2       2022-02-25         NAN      2022-02-25 2022-02-25
905     S1       2022-03-14     2022-03-14     NAN        NAN
905     S1       2022-03-16         NAN        NAN        NAN
905     S1       2022-03-18         NAN        NAN      2022-03-18

CodePudding user response:

I don't get why B has a value in End_Date, apart from that, the rest is relatively easily done with boolean indexing:

m1 = df['Status'].str.match(r'S\d ')
m2 = df['Status'].ne(df.groupby('my_id')['Status'].shift())
m3 = (~m1)&m1.groupby(df['my_id']).shift()

out = (df
 .join(df
       .loc[m1&m2, ['Status', 'my_date']]
       .set_index('Status', append=True)['my_date'].unstack()
       .rename(columns={'S1': 'received_date', 'S2': 'Accept_date'})
      )
 .join(pd.concat([df.loc[m3, 'my_date'],
                  df.groupby('my_id')['my_date'].tail(1)]).rename('End_date')
      )
)

output:

    my_id Status     my_date received_date Accept_date    End_date
0     434      A  2021-12-20           NaN         NaN         NaN
1     434     S1  2021-12-23    2021-12-23         NaN         NaN
2     434     S1  2021-12-24           NaN         NaN         NaN
3     434     S1  2021-12-24           NaN         NaN         NaN
4     434     S1  2021-12-25           NaN         NaN         NaN
5     434     S2  2022-01-05           NaN  2022-01-05         NaN
6     434     S2  2022-01-06           NaN         NaN         NaN
7     434     S2  2022-01-07           NaN         NaN         NaN
8     434     S2  2022-01-08           NaN         NaN         NaN
9     434     S2  2022-01-09           NaN         NaN         NaN
10    434     S2  2022-01-10           NaN         NaN         NaN
11    434      B  2022-01-11           NaN         NaN  2022-01-11
12    434     S1  2022-10-20    2022-10-20         NaN         NaN
13    434     S1  2022-10-24           NaN         NaN         NaN
14    434     S2  2022-10-25           NaN  2022-10-25  2022-10-25
15    980     S1  2021-12-20    2021-12-20         NaN  2021-12-20
16    341     S1  2021-08-18    2021-08-18         NaN  2021-08-18
17    387     S1  2022-03-15    2022-03-15         NaN  2022-03-15
18    625     S1  2022-02-16    2022-02-16         NaN         NaN
19    625     S1  2022-02-22           NaN         NaN         NaN
20    625     S2  2022-02-25           NaN  2022-02-25  2022-02-25
21    905     S1  2022-03-14    2022-03-14         NaN         NaN
22    905     S1  2022-03-16           NaN         NaN         NaN
23    905     S1  2022-03-18           NaN         NaN  2022-03-18
  • Related