Home > Mobile >  get the average time that an object stayed in a certain state in Pandas
get the average time that an object stayed in a certain state in Pandas

Time:11-19

I have got a large DF that contains some sales opportunities. this opps change stage several times during their lifecycle and we can see what those changes are and when were they made. The possible stages are:

Closed Won        
Closed No Deal    
Propose           
Negotiate          
Qualify            
Closed Lost        
Invalid            
Identify           
Implemented        
Close Lost            
Close Won                         
Close No Deal

I also know the OldValue before the opp was changed. What i need to do is to have a separate DF where i get the median value of how long an opportunity stayed at each stage. I have researched the web and SO and i cannot find a solution that meets my criteria.

Here is some sample data:

reatedDate                    OpportunityId     OldValue    NewValue
2020-05-05T12:04:32.000Z    0060N00000TbLneQAF  Propose     Qualify
2020-07-06T08:44:08.000Z    0060N00000TbLneQAF  Qualify     Identify
2020-08-05T08:59:45.000Z    0060N00000TbLneQAF  Identify    Qualify
2020-08-05T12:02:59.000Z    0060N00000TbLneQAF  Qualify     Propose
2020-09-22T06:47:16.000Z    0060N00000TbLneQAF  Propose     Qualify
2020-10-08T15:33:29.000Z    0060N00000TbLneQAF  Qualify     Identify
2020-10-08T15:40:21.000Z    0060N00000TbLneQAF  Identify    Closed No Deal
2021-07-29T07:57:28.000Z    0060N00000TbLohQAF  Identify    Closed No Deal
2021-10-17T03:07:24.000Z    0060N00000TbLtwQAF  Qualify     Closed No Deal
2021-07-27T13:57:34.000Z    0060N00000TbMhkQAF  Identify    Closed No Deal
2020-04-22T13:35:30.000Z    0060N00000TbMkjQAF  Negotiate   Closed Lost
2020-09-25T09:37:32.000Z    0060N00000TbN8qQAF  Qualify     Propose
2020-09-25T09:37:41.000Z    0060N00000TbN8qQAF  Propose     Negotiate
2021-09-06T14:31:05.000Z    0060N00000TbN8qQAF  Negotiate   Propose
2021-11-03T11:09:56.000Z    0060N00000TbNF8QAN  Identify    Qualify
2020-04-29T15:43:58.000Z    0060N00000TbNFSQA3  Identify    Invalid
2021-01-07T09:35:56.000Z    0060N00000TbNUDQA3  Identify    Closed No Deal
2020-12-03T08:53:12.000Z    0060N00000TbNUSQA3  Qualify     Identify
2021-09-08T09:41:54.000Z    0060N00000TbNUSQA3  Identify    Closed Lost
2021-04-14T07:31:49.000Z    0060N00000TbNg4QAF  Identify    Closed No Deal
2020-04-27T12:19:51.000Z    0060N00000TbNwCQAV  Qualify     Identify
2020-05-04T11:15:00.000Z    0060N00000TbNxPQAV  Identify    Closed No Deal
2021-05-24T03:13:10.000Z    0060N00000TbNywQAF  Qualify     Closed No Deal
2021-05-28T14:51:32.000Z    0060N00000TbO3SQAV  Identify    Invalid
2021-07-27T13:25:50.000Z    0060N00000TbOBlQAN  Identify    Closed No Deal
2021-07-27T13:25:50.000Z    0060N00000TbOCeQAN  Identify    Closed No Deal
2021-07-27T13:25:50.000Z    0060N00000TbOELQA3  Identify    Closed No Deal
2020-04-28T15:12:53.000Z    0060N00000TbOIrQAN  Qualify     Negotiate
2020-05-18T14:11:18.000Z    0060N00000TbOIrQAN  Negotiate   Closed Won
2021-07-27T13:22:09.000Z    0060N00000TbOJzQAN  Identify    Closed No Deal
2021-07-27T13:22:09.000Z    0060N00000TbOLbQAN  Identify    Closed No Deal
2020-08-13T05:04:36.000Z    0060N00000TbOOzQAN  Propose     Identify
2020-09-06T15:36:36.000Z    0060N00000TbOOzQAN  Identify    Invalid
2021-05-27T14:22:10.000Z    0060N00000TbOWKQA3  Qualify     Identify
2021-05-27T14:22:27.000Z    0060N00000TbOWKQA3  Identify    Closed Lost
2020-04-27T12:25:52.000Z    0060N00000TbOX3QAN  Qualify     Identify
2021-01-08T15:27:33.000Z    0060N00000TbOX3QAN  Identify    Qualify
2020-04-13T10:53:57.000Z    0060N00000TbOY6QAN  Qualify     Identify
2020-12-03T10:38:35.000Z    0060N00000TbOY6QAN  Identify    Closed Lost
2020-04-13T10:54:57.000Z    0060N00000TbOYaQAN  Qualify     Identify
2020-12-01T10:50:41.000Z    0060N00000TbOYaQAN  Identify    Closed No Deal
2021-07-27T13:57:34.000Z    0060N00000TbOb5QAF  Identify    Closed No Deal
2021-07-27T13:57:34.000Z    0060N00000TbOeYQAV  Identify    Closed No Deal
2020-05-29T12:28:44.000Z    0060N00000TbOgyQAF  Identify    Qualify
2020-12-18T07:34:18.000Z    0060N00000TbOgyQAF  Qualify     Identify
2020-12-18T07:34:43.000Z    0060N00000TbOgyQAF  Identify    Invalid
2021-07-27T13:22:09.000Z    0060N00000TbOhSQAV  Identify    Closed No Deal
2020-04-15T11:30:09.000Z    0060N00000TbOk7QAF  Identify    Invalid
2020-08-26T03:16:46.000Z    0060N00000TbOnVQAV  Qualify     Closed No Deal
2020-04-03T13:08:23.000Z    0060N00000TbOy4QAF  Identify    Closed Lost      

what i have done so far is try to use the group_by function and then try to get the median but i get stuck because i need to be looking at the dates of when the opp was changed and compare that to the time it took to get the new value. I would be looking at the below outcome:

Stage            Average Time
Closed Won        x days
Closed No Deal    x days
Propose           x days
Negotiate         x days 
Qualify           x days 
Closed Lost       x days 
Invalid           x days 
Identify          x days 
Implemented       x days 
Close Lost        x days 
Close Won         x days                
Close No Deal     x days

Hope that makes sense. i am a total noob by the way so thank you for helping!

CodePudding user response:

This worked for me, but you forgot to mention wich "OldValue" represents the creation of an OPP.

df['difference'] = df.groupby('OpportunityId').reatedDate.diff()
df['aux'] = df['OldValue']   ' - '   df['NewValue']
df['days_diff'] = df['difference'].dt.days
df.groupby('aux')['days_diff'].mean()

Output:

aux                     days_difference
Identify - Closed Lost       170.666667
Identify - Closed No Deal    115.500000
Identify - Invalid            12.000000
Identify - Qualify           143.000000
Negotiate - Closed Lost             NaN
Negotiate - Closed Won        19.000000
Negotiate - Propose          346.000000
Propose - Identify                  NaN
Propose - Negotiate            0.000000
Propose - Qualify             47.000000
Qualify - Closed No Deal            NaN
Qualify - Identify            93.000000
Qualify - Negotiate                 NaN
Qualify - Propose              0.000000

CodePudding user response:

I think a possible solution would be to first sort the dataframe by OpportunityId and reatedDate:

sorted_df = df.sort_values(by=['OpportunityId','reatedDate']) 

Then you can get the row wise difference:

sorted_df['time_diff'] = sorted_df.reatedDate.diff()

Also you need to identify the cases where there is a transition in the ID. As those need to be excluded from the statistics:

sorted_df['id_diff'] = sorted_df.OpportunityId.diff()
filtered_df=sorted_df[sorted_df.id_diff==0]

Then you should be able to do your stats:

filtered_df.groupby(['OldValue'])[['time_diff']].median()

If you need only the days you might add a .dt.days

  • Related