Home > Enterprise >  Querying the median time between the second and third profile edit
Querying the median time between the second and third profile edit

Time:11-14

I have the following table;

| uuid | user_id | user_created | profile_edited |
|:---- |:------:| -----:|:------:|:------:|
| 1c5d134c  | user_3    | 2022-11-10T19:09:05 00:00 | 2022-11-18T18:00:05 00:00
| 1c5d134b  | user_3    | 2022-11-10T19:09:05 00:00 | 2022-11-15T18:00:05 00:00
| 1c5d134a  | user_3    | 2022-11-10T19:09:05 00:00 | 2022-11-14T18:00:05 00:00
| 1c5d434a  | user_1    | 2022-11-10T19:09:05 00:00 | 2022-11-13T19:09:05 00:00
| 1c8b424b  | user_1    | 2022-11-10T19:09:05 00:00 | 2022-11-10T16:09:05 00:00
| 1c8b424c  | user_2    | 2022-11-01T19:09:05 00:000 | 2022-11-19T19:09:05 00:00
| 1c8b424a  | user_2    | 2022-11-01T19:09:05 00:000 | 2022-11-22T19:09:05 00:00
| 1c8b424b  | user_2    | 2022-11-01T19:09:05 00:000 | 2022-11-24T19:09:05 00:00
| 1c5d434c  | user_1    | 2022-11-10T19:09:05 00:00 | 2022-11-18T19:09:05 00:00
| 1c5d434e  | user_1    | 2022-11-10T19:09:05 00:00 | 2022-11-16T19:09:05 00:00
| 1c5d434c  | user_1    | 2022-11-10T19:09:05 00:00 | 2022-11-14T19:09:05 00:00

In the example above we can see the second and third profile edit times as below;

| user_id | second_edit | third_edit | difference
|:---- |:------:| -----:|:------:|:------:|
| user_1  | 2022-11-13T19:09:05 00:00 | 2022-11-14T19:09:05 00:00 | 24 hours
| user_2  | 2022-11-22T19:09:05 00:00 | 2022-11-24T19:09:05 00:00 |48 hours
| user_3  | 2022-11-15T18:00:05 00:00 | 2022-11-18T18:00:05 00:00 | 72 hours

And the resulting query should find the median between 24, 48, and 72.

CodePudding user response:

In POSTGRES you can do something like this.

first get the row_number for every uuid descending by the date edited, The rest is getting the difference.

You can use this and derive similar solutions for MySQL

WITH CTE AS(
SELECT 
"uuid", "user_id", "user_created","profile_edited",
ROW_NUMBER() OVER (PARTITION BY "uuid" ORDER BY "profile_edited" DESC) rn
FROM TableB)
SELECT
  "uuid",string_agg(DISTINCT "user_id", ','), MAX("user_created")  "user_created",
 AGE((SELECT "profile_edited" FROM CTE WHERE rn = 2 AND "uuid" = CTE."uuid"   LIMIT 1)
  , (SELECT "profile_edited" FROM CTE WHERE rn = 3 AND "uuid" = CTE."uuid"  LIMIT 1)) AS difference
  FROM CTE
GROUP BY "uuid"
uuid string_agg user_created difference
1c5d434c user_1 2022-11-10 19:09:05 2 days
1c8b424a user_1,user_2 2022-11-10 19:09:05 2 days
SELECT 2

fiddle

To your changed Question

WITH CTE AS(
SELECT 
 "user_id", "user_created","profile_edited",
ROW_NUMBER() OVER (PARTITION BY "user_id" ORDER BY "profile_edited" ASC) rn
FROM TableB),
  CTE2 AS(

  SELECT
 "user_id"
  ,MAX("profile_edited") filter (where rn = 2 AND "user_id" = CTE."user_id") "second"
  ,MAX("profile_edited") filter (where rn = 3 AND "user_id" = CTE."user_id") "third",
   abs(extract(epoch from
  MAX("profile_edited") filter (where rn = 2 AND "user_id" = CTE."user_id") -
  MAX("profile_edited") filter (where rn = 3 AND "user_id" = CTE."user_id")
  )/3600)::INTEGER AS difference_hours
  FROM CTE
  GROUP BY "user_id")
SELECT * FROM CTE2
UNION ALL
  SELECT 'Median',NULL,NULL,AVG(difference_hours)::INTEGER FROM CTE2 

user_id second third difference_hours
user_1 2022-11-13 19:09:05 2022-11-14 19:09:05 24
user_2 2022-11-22 19:09:05 2022-11-24 19:09:05 48
user_3 2022-11-15 18:00:05 2022-11-18 18:00:05 72
Median null null 48
SELECT 4
  

fiddle

CodePudding user response:

In postgres you can use the percentile_cont ordered-set aggregate function to find the median as the following:

with edit_diff as
(
  select round(extract('epoch' from max(profile_edited)-min(profile_edited))/3600) difference
  from
  (
    select *, 
      row_number() over (partition by user_id order by profile_edited) rn
    from table_name
  ) T
  where rn in(2,3)
  group by user_id
)
select percentile_cont(0.5) within group (order by difference) median
from edit_diff

See a demo.

OUTPUT: 48

For MySQL you can use row_number() and count functions to calculate the median as the following:

with edit_diff as
(
  select timestampdiff(hour, min(profile_edited), max(profile_edited)) difference
  from
  (
    select *, 
      row_number() over (partition by user_id order by profile_edited) rn
    from table_name
  ) T
  where rn in(2,3)
  group by user_id
)

select round(avg(difference), 2) median
from
(
  select difference,
         row_number() over (order by difference) diffOrder,
        count(*) over () cn
  from edit_diff
) T
where diffOrder between floor((cn   1) / 2) and ceil((cn   1) / 2);
  •  Tags:  
  • sql
  • Related