How to know time people need to pay for the services I offer-CodePudding

I have data that contains id, gender, price, time to pay. For example:

import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                    'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                    'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                    'timeToPay':['0 days 01:20:00','1 days 03:24:02','0 days 12:45:55','0 days 05:38:20','0 days 02:44:12','0 days 11:25:38','1 days 01:11:00','0 days 05:22:00']})

Time to pay, is the time difference between when the customer orders and pays (datatype timedelta64[ns]).

How I can get the best time to pay range for this data, I mean, do people pay between 0-1 hours or 4-6 hours or maybe 1 day - 2 days. I want to know how long people pay for the services I offer.

I try to group by the data based on time to pay but I think it doesn't give information I need

CodePudding user response：

I would go for a histogram. Try it out with different bin size, it depends on the number of rows you have.

If you need to meassure it on different services, you might need to split the date with a group by first and then hist each group.

pandas.DataFrame.hist

Further you could calculate the average, this again depends on your data dist, so basically you need to know your data first.

CodePudding user response：

IIUC,

I modified you code a little bit to make it easier to reproduce.

import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
                    'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
                    'price': [250, 1000,300, 250, 1000, 500, 450, 500],
                    'timeToPay':[ '0 days 01:20:00'
                                 ,'1 days 03:24:02'
                                 ,'0 days 12:45:55'
                                 ,'0 days 05:38:20'
                                 ,'0 days 02:44:12'
                                 ,'0 days 11:25:38'
                                 ,'1 days 01:11:00'
                                 ,'0 days 05:22:00']})
df1['timeToPay']=df1['timeToPay'].apply(lambda x: pd.Timedelta(x))

now timeToPay is a timedelta, then you may transform timeToPay to hour and day with this snippet.

import math
df1['timeToPay_hour']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(60*60)))
df1['timeToPay_day']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(24*60*60)))

df1

Now, your df1 looks like this

	id	gender	price	timeToPay	timeToPay_hour	timeToPay_day
0	1	Male	250	0 days 01:20:00	2	1
1	2	Female	1000	1 days 03:24:02	28	2
2	3	Male	300	0 days 12:45:55	13	1
3	4	Female	250	0 days 05:38:20	6	1
4	5	Male	1000	0 days 02:44:12	3	1
5	6	Female	500	0 days 11:25:38	12	1
6	7	Male	450	1 days 01:11:00	26	2
7	8	Male	500	0 days 05:22:00	6	1

Then, you may compare with gender with timeToPay_hour like this.

df1[['gender','timeToPay_hour']].hist(bins=5)

Hope this help.