I have data that contains id, gender, price, time to pay. For example:
import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
'price': [250, 1000,300, 250, 1000, 500, 450, 500],
'timeToPay':['0 days 01:20:00','1 days 03:24:02','0 days 12:45:55','0 days 05:38:20','0 days 02:44:12','0 days 11:25:38','1 days 01:11:00','0 days 05:22:00']})
Time to pay, is the time difference between when the customer orders and pays (datatype timedelta64[ns]).
How I can get the best time to pay range for this data, I mean, do people pay between 0-1 hours or 4-6 hours or maybe 1 day - 2 days. I want to know how long people pay for the services I offer.
I try to group by the data based on time to pay but I think it doesn't give information I need
CodePudding user response:
I would go for a histogram. Try it out with different bin size, it depends on the number of rows you have.
If you need to meassure it on different services, you might need to split the date with a group by first and then hist each group.
Further you could calculate the average, this again depends on your data dist, so basically you need to know your data first.
CodePudding user response:
IIUC,
I modified you code a little bit to make it easier to reproduce.
import pandas as pd
df1 = pd.DataFrame({'id': ['1','2','3','4','5','6','7','8'],
'gender': ['Male','Female','Male','Female','Male','Female','Male','Male'],
'price': [250, 1000,300, 250, 1000, 500, 450, 500],
'timeToPay':[ '0 days 01:20:00'
,'1 days 03:24:02'
,'0 days 12:45:55'
,'0 days 05:38:20'
,'0 days 02:44:12'
,'0 days 11:25:38'
,'1 days 01:11:00'
,'0 days 05:22:00']})
df1['timeToPay']=df1['timeToPay'].apply(lambda x: pd.Timedelta(x))
now timeToPay
is a timedelta
, then you may transform timeToPay
to hour
and day
with this snippet.
import math
df1['timeToPay_hour']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(60*60)))
df1['timeToPay_day']=df1['timeToPay'].apply(lambda x: math.ceil(x.total_seconds()/(24*60*60)))
df1
Now, your df1
looks like this
id | gender | price | timeToPay | timeToPay_hour | timeToPay_day | |
---|---|---|---|---|---|---|
0 | 1 | Male | 250 | 0 days 01:20:00 | 2 | 1 |
1 | 2 | Female | 1000 | 1 days 03:24:02 | 28 | 2 |
2 | 3 | Male | 300 | 0 days 12:45:55 | 13 | 1 |
3 | 4 | Female | 250 | 0 days 05:38:20 | 6 | 1 |
4 | 5 | Male | 1000 | 0 days 02:44:12 | 3 | 1 |
5 | 6 | Female | 500 | 0 days 11:25:38 | 12 | 1 |
6 | 7 | Male | 450 | 1 days 01:11:00 | 26 | 2 |
7 | 8 | Male | 500 | 0 days 05:22:00 | 6 | 1 |
Then, you may compare with gender with timeToPay_hour
like this.
df1[['gender','timeToPay_hour']].hist(bins=5)
Hope this help.