Home > OS >  Average time between delivered_dates for each product_id - pandas
Average time between delivered_dates for each product_id - pandas

Time:04-09

How to find average time between delivered_date for each product_id?

Dataset Sample

    data = {
    'product_id':[27036,1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280,28712],
    'delivered_datetime':pd.to_datetime(["2021-11-20 04:17:07 UTC", "2020-11-27 00:23:33 UTC", "2020-11-28 22:51:57 UTC", "2022-02-08 07:15:09 UTC","2022-01-21 09:10:38 UTC", "2020-11-12 12:57:58 UTC","2021-10-26 23:03:33 UTC","2022-01-01 09:00:10 UTC", "2022-02-14 05:34:04 UTC", "2022-02-13 05:52:01 UTC", "2022-03-08 10:50:06 UTC", "2020-10-08 09:53:50 UTC", "2021-03-23 14:33:21 UTC", "2021-03-21 13:19:59 UTC", "2021-12-31 13:41:11 UTC", "2021-05-17 03:09:10 UTC", "2021-05-17 01:06:48 UTC", "2021-06-10 15:47:14 UTC", "2022-03-07 07:23:15 UTC","2022-02-23 14:58:57 UTC"]),
    'price':[12.55, 44.7, 6.84, 15.73, 2.12, 27.11, 8.73, 18.17, 4.26, 13.64, 21.62, 64.26, 14.72, 11.6, 15.05, 14.79, 16.73, 3.14, 39.82, 25.56
],

}
df =pd.DataFrame(data)
|id | product_id| delivered_date            | price |
|-- | ----------| ------------------------- | ----- |
| 0 | 27036     | 2021-11-20 04:17:07 00:00 | 12.55 |
| 1 | 1748      | 2020-11-27 00:23:33 00:00 | 44.70 |
| 2 | 8849      | 2020-11-28 22:51:57 00:00 | 6.84  |
| 3 | 10280     | 2022-02-08 07:15:09 00:00 | 15.73 |
| 4 | 28712     | 2022-01-21 09:10:38 00:00 | 2.12  |
| 5 | 27036     | 2020-11-12 12:57:58 00:00 | 27.11 |
| 6 | 1748      | 2021-10-26 23:03:33 00:00 | 8.73  |
| 7 | 8849      | 2022-01-01 09:00:10 00:00 | 18.17 |
| 8 | 10280     | 2022-02-14 05:34:04 00:00 | 4.2   |
| 9 | 28712     | 2022-02-13 05:52:01 00:00 | 13.64 |
| 10| 27036     | 2022-03-08 10:50:06 00:00 | 21.62 |
| 11| 1748      | 2020-10-08 09:53:50 00:00 | 64.26 |

Average time between delivered_dates for each product_id?

    df['interval'] = df.groupby('product_id')['delivered_datetime']\
                  .diff().mean()

This produces one time interval for all rows, rather than separate averages for each product.

Expected output

|id | product_id| delivered_date            | price |Exp_out|
|-- | ----------| ------------------------- | ----- |----- |
| 0 | 27036     | 2021-11-20 04:17:07 00:00 | 12.55 |a     |
| 1 | 1748      | 2020-11-27 00:23:33 00:00 | 44.70 |b     |
| 2 | 8849      | 2020-11-28 22:51:57 00:00 | 6.84  |c     |
| 3 | 10280     | 2022-02-08 07:15:09 00:00 | 15.73 |d     |
| 4 | 28712     | 2022-01-21 09:10:38 00:00 | 2.12  |e     |
| 5 | 27036     | 2020-11-12 12:57:58 00:00 | 27.11 |a     |
| 6 | 1748      | 2021-10-26 23:03:33 00:00 | 8.73  |b     |
| 7 | 8849      | 2022-01-01 09:00:10 00:00 | 18.17 |c     |
| 8 | 10280     | 2022-02-14 05:34:04 00:00 | 4.2   |d     |
| 9 | 28712     | 2022-02-13 05:52:01 00:00 | 13.64 |e     |
| 10| 27036     | 2022-03-08 10:50:06 00:00 | 21.62 |a     |
| 11| 1748      | 2020-10-08 09:53:50 00:00 | 64.26 |e     |

Where a,b,c,d and e refer to the average timedelta for each product

CodePudding user response:

You should probably sort on the date first. Then groupby after calculating the differences to get the mean as a new column:

#calculate the average time difference
df["interval"] = df.groupby("product_id")["delivered_datetime"].transform(lambda x: x.sort_values().diff().mean())

#convert interval to days if needed
df["interval"] = df["interval"].dt.total_seconds().div(86400)

>>> df
    product_id        delivered_datetime  price    interval
0        27036 2021-11-20 04:17:07 00:00  12.55  160.303735
1         1748 2020-11-27 00:23:33 00:00  44.70  127.849471
2         8849 2020-11-28 22:51:57 00:00   6.84  132.807458
3        10280 2022-02-08 07:15:09 00:00  15.73  116.917423
4        28712 2022-01-21 09:10:38 00:00   2.12   18.018002
5        27036 2020-11-12 12:57:58 00:00  27.11  160.303735
6         1748 2021-10-26 23:03:33 00:00   8.73  127.849471
7         8849 2022-01-01 09:00:10 00:00  18.17  132.807458
8        10280 2022-02-14 05:34:04 00:00   4.26  116.917423
9        28712 2022-02-13 05:52:01 00:00  13.64   18.018002
10       27036 2022-03-08 10:50:06 00:00  21.62  160.303735
11        1748 2020-10-08 09:53:50 00:00  64.26  127.849471
12        8849 2021-03-23 14:33:21 00:00  14.72  132.807458
13       10280 2021-03-21 13:19:59 00:00  11.60  116.917423
14       28712 2021-12-31 13:41:11 00:00  15.05   18.018002
15       27036 2021-05-17 03:09:10 00:00  14.79  160.303735
16        1748 2021-05-17 01:06:48 00:00  16.73  127.849471
17        8849 2021-06-10 15:47:14 00:00   3.14  132.807458
18       10280 2022-03-07 07:23:15 00:00  39.82  116.917423
19       28712 2022-02-23 14:58:57 00:00  25.56   18.018002
  • Related