Home > Net >  How can i check if a user repurchase from the same shop after they've cancelled their order?
How can i check if a user repurchase from the same shop after they've cancelled their order?

Time:10-30

My idea is to make a combination of userid and shopid where their status is cancelled, and filter the original data frame using this combination, but is it possible to do that?

df_final1[['userid', 'shopid', 'status', 
'created_time', 'cancel_time']].head()

         userid     shopid               status        created_time  \
1254  248390148  631980847                 Paid 2019-01-10 01:08:00   
2220  362408494  720477845                 Paid 2019-01-10 01:19:00   
4843  665442742  724056488                 Paid 2019-01-10 01:25:00   
378   144816680  550081468                 Paid 2019-01-10 02:33:00   
1526  278521124  240583029  Paid then Cancelled 2019-01-10 02:37:00   

             cancel_time  
1254                 NaT  
2220                 NaT  
4843                 NaT  
378                  NaT  
1526 2019-10-02 13:26:00  
df_final1[['userid', 'shopid', 'status', 
'created_time', 'cancel_time']].loc[df_final1['status']
.str.contains('cancelled', case=False)].head()

         userid     shopid                status        created_time  \
1526  278521124  240583029   Paid then Cancelled 2019-01-10 02:37:00   
1918  326681402  429387935  Unpaid and Cancelled 2019-01-10 05:05:00   
6233  818601412  439306870   Paid then Cancelled 2019-01-10 05:45:00   
3485  502346098  843003698  Unpaid and Cancelled 2019-01-10 08:04:00   
2682  414657587  998402920  Unpaid and Cancelled 2019-01-10 08:20:00   

             cancel_time  
1526 2019-10-02 13:26:00  
1918 2019-10-02 17:09:00  
6233 2019-10-03 23:32:00  
3485 2019-10-01 20:29:00  
2682 2019-10-02 12:59:00  

Okay so after some tries i got this, so now the problem is how should i filter the users who have repurchased from the same shop after cancelled?

print(repurchase_true[['userid', 'shopid', 'status']])

         userid     shopid                status
154   555157016  469831329                  Paid
715   690156368  954422341                  Paid
791   650971451  624341114                  Paid
2036  195170292  173105930   Paid then Cancelled
2040  195170292  173105930   Paid then Cancelled
2777  650971451  624341114  Unpaid and Cancelled
3587  652903759  315658623                  Paid
3749  650971451  624341114   Paid then Cancelled
3961  106380451  383755396                  Paid
4065  114564157  788847405   Paid then Cancelled
4068  650971451  579205102                  Paid
4117  652903759  771950408  Unpaid and Cancelled
4345  106380451  383755396   Paid then Cancelled
4384  106380451  383755396                  Paid
4608  690156368  954422341                  Paid
4750  555157016  469831329  Unpaid and Cancelled
4887  108425329  626078476   Paid then Cancelled
4923  108425329  626078476                  Paid
5099  195170292  173105930   Paid then Cancelled
5198  481323800  154396809  Unpaid and Cancelled
5227  555157016  843003698                  Paid
5288  481323800  154396809                  Paid
5577  195170292  843114676                  Paid
5675  195170292  843114676                  Paid
6190  652903759  315658623  Unpaid and Cancelled
6239  481323800  154396809                  Paid
6395  555157016  843003698   Paid then Cancelled
6634  690156368  954422341  Unpaid and Cancelled
6934  114564157  788847405   Paid then Cancelled

CodePudding user response:

Try this:

Creating a dataframe with status as Cancelled (as done by you):

df_cancel = df[['userid', 'shopid', 'status', 
'created_time', 'cancel_time']].loc[df['status']
.str.contains('cancelled', case=False)]

Similarly, creating a df with status as not cancelled (here I used cancel_time to get this, you may use anything that works with you):

df_active = df[['userid', 'shopid', 'status', 
'created_time', 'cancel_time']].loc[df['cancel_time'] == 'NaT']

Check if everything is good using df_cancel.head() and df_active.head()

Renaming the columns, so that it would be easy to filter:

df_cancel.rename(columns = {'status':'c_status', 'created_time':'c_created_time',
                              'cancel_time':'c_cancel_time'}, inplace = True)

df_active.rename(columns = {'status':'a_status', 'created_time':'a_created_time',
                              'cancel_time':'a_cancel_time'}, inplace = True)

Merging both the dataframes on userid and shopid:

merged = pd.merge(df_active, df_cancel, how="inner", left_on=['userid', 'shopid'], right_on=['userid', 'shopid']);

Filtering the merged dataframe to find purchase date after the cancel_date which is the desired result:

df_final = merged[['userid', 'shopid', 'a_status', 
'a_created_time', 'a_cancel_time']].loc[merged['a_created_time'] > merged['c_cancel_time']]

df_final.head()

Note: You may add more columns to the df_final or alter any conditions used above based on your requirement.

  • Related