How to find average time between delivered_date for each product_id?
Dataset Sample
data = {
'product_id':[27036,1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280,28712],
'delivered_datetime':pd.to_datetime(["2021-11-20 04:17:07 UTC", "2020-11-27 00:23:33 UTC", "2020-11-28 22:51:57 UTC", "2022-02-08 07:15:09 UTC","2022-01-21 09:10:38 UTC", "2020-11-12 12:57:58 UTC","2021-10-26 23:03:33 UTC","2022-01-01 09:00:10 UTC", "2022-02-14 05:34:04 UTC", "2022-02-13 05:52:01 UTC", "2022-03-08 10:50:06 UTC", "2020-10-08 09:53:50 UTC", "2021-03-23 14:33:21 UTC", "2021-03-21 13:19:59 UTC", "2021-12-31 13:41:11 UTC", "2021-05-17 03:09:10 UTC", "2021-05-17 01:06:48 UTC", "2021-06-10 15:47:14 UTC", "2022-03-07 07:23:15 UTC","2022-02-23 14:58:57 UTC"]),
'price':[12.55, 44.7, 6.84, 15.73, 2.12, 27.11, 8.73, 18.17, 4.26, 13.64, 21.62, 64.26, 14.72, 11.6, 15.05, 14.79, 16.73, 3.14, 39.82, 25.56
],
}
df =pd.DataFrame(data)
|id | product_id| delivered_date | price |
|-- | ----------| ------------------------- | ----- |
| 0 | 27036 | 2021-11-20 04:17:07 00:00 | 12.55 |
| 1 | 1748 | 2020-11-27 00:23:33 00:00 | 44.70 |
| 2 | 8849 | 2020-11-28 22:51:57 00:00 | 6.84 |
| 3 | 10280 | 2022-02-08 07:15:09 00:00 | 15.73 |
| 4 | 28712 | 2022-01-21 09:10:38 00:00 | 2.12 |
| 5 | 27036 | 2020-11-12 12:57:58 00:00 | 27.11 |
| 6 | 1748 | 2021-10-26 23:03:33 00:00 | 8.73 |
| 7 | 8849 | 2022-01-01 09:00:10 00:00 | 18.17 |
| 8 | 10280 | 2022-02-14 05:34:04 00:00 | 4.2 |
| 9 | 28712 | 2022-02-13 05:52:01 00:00 | 13.64 |
| 10| 27036 | 2022-03-08 10:50:06 00:00 | 21.62 |
| 11| 1748 | 2020-10-08 09:53:50 00:00 | 64.26 |
Average time between delivered_dates for each product_id?
df['interval'] = df.groupby('product_id')['delivered_datetime']\
.diff().mean()
This produces one time interval for all rows, rather than separate averages for each product.
Expected output
|id | product_id| delivered_date | price |Exp_out|
|-- | ----------| ------------------------- | ----- |----- |
| 0 | 27036 | 2021-11-20 04:17:07 00:00 | 12.55 |a |
| 1 | 1748 | 2020-11-27 00:23:33 00:00 | 44.70 |b |
| 2 | 8849 | 2020-11-28 22:51:57 00:00 | 6.84 |c |
| 3 | 10280 | 2022-02-08 07:15:09 00:00 | 15.73 |d |
| 4 | 28712 | 2022-01-21 09:10:38 00:00 | 2.12 |e |
| 5 | 27036 | 2020-11-12 12:57:58 00:00 | 27.11 |a |
| 6 | 1748 | 2021-10-26 23:03:33 00:00 | 8.73 |b |
| 7 | 8849 | 2022-01-01 09:00:10 00:00 | 18.17 |c |
| 8 | 10280 | 2022-02-14 05:34:04 00:00 | 4.2 |d |
| 9 | 28712 | 2022-02-13 05:52:01 00:00 | 13.64 |e |
| 10| 27036 | 2022-03-08 10:50:06 00:00 | 21.62 |a |
| 11| 1748 | 2020-10-08 09:53:50 00:00 | 64.26 |e |
Where a,b,c,d and e refer to the average timedelta for each product
CodePudding user response:
You should probably sort on the date first. Then groupby
after calculating the differences to get the mean as a new column:
#calculate the average time difference
df["interval"] = df.groupby("product_id")["delivered_datetime"].transform(lambda x: x.sort_values().diff().mean())
#convert interval to days if needed
df["interval"] = df["interval"].dt.total_seconds().div(86400)
>>> df
product_id delivered_datetime price interval
0 27036 2021-11-20 04:17:07 00:00 12.55 160.303735
1 1748 2020-11-27 00:23:33 00:00 44.70 127.849471
2 8849 2020-11-28 22:51:57 00:00 6.84 132.807458
3 10280 2022-02-08 07:15:09 00:00 15.73 116.917423
4 28712 2022-01-21 09:10:38 00:00 2.12 18.018002
5 27036 2020-11-12 12:57:58 00:00 27.11 160.303735
6 1748 2021-10-26 23:03:33 00:00 8.73 127.849471
7 8849 2022-01-01 09:00:10 00:00 18.17 132.807458
8 10280 2022-02-14 05:34:04 00:00 4.26 116.917423
9 28712 2022-02-13 05:52:01 00:00 13.64 18.018002
10 27036 2022-03-08 10:50:06 00:00 21.62 160.303735
11 1748 2020-10-08 09:53:50 00:00 64.26 127.849471
12 8849 2021-03-23 14:33:21 00:00 14.72 132.807458
13 10280 2021-03-21 13:19:59 00:00 11.60 116.917423
14 28712 2021-12-31 13:41:11 00:00 15.05 18.018002
15 27036 2021-05-17 03:09:10 00:00 14.79 160.303735
16 1748 2021-05-17 01:06:48 00:00 16.73 127.849471
17 8849 2021-06-10 15:47:14 00:00 3.14 132.807458
18 10280 2022-03-07 07:23:15 00:00 39.82 116.917423
19 28712 2022-02-23 14:58:57 00:00 25.56 18.018002