I have a table like:
id category value1 value2 value3
1 1 100 324 940
1 1 222 404 1000
1 1 333 304 293
1 2 490 490 400
1 2 140 400 499
1 3 400 400 103
1 3 300 123 124
I need to calculate the P50 and P90 values for each (id
, category
) combination. My original approach:
SELECT DISTINCT
id
, category
, PERCENTILE_CONT(value1, 0.5) OVER (PARTITION BY id, category) AS p50_value1
, PERCENTILE_CONT(value1, 0.5) OVER (PARTITION BY id, category) AS p50_value2
, PERCENTILE_CONT(value1, 0.5) OVER (PARTITION BY id, category) AS p50_value3
, PERCENTILE_CONT(value1, 0.9) OVER (PARTITION BY id, category) AS p90_value1
, PERCENTILE_CONT(value1, 0.9) OVER (PARTITION BY id, category) AS p90_value2
, PERCENTILE_CONT(value1, 0.9) OVER (PARTITION BY id, category) AS p90_value3
FROM my_table;
I'm using the above code in a longer query but it's resulting in this error, so I was hoping to better optimize it:
Resources exceeded during query execution. The query could not be executed in the allotted memory. Peak usage: 133% of limit. Top memory consumer: OVER() clauses: 100%
.
Is there any way to GROUP BY id, category
here instead?
Alternatively, would the following method be faster or is it just a cleaner way of writing the same query?
SELECT DISTINCT
id
, category
, PERCENTILE_CONT(value1, 0.5) OVER w AS p50_value1
, PERCENTILE_CONT(value1, 0.5) OVER w AS p50_value2
, PERCENTILE_CONT(value1, 0.5) OVER w AS p50_value3
, PERCENTILE_CONT(value1, 0.9) OVER w AS p90_value1
, PERCENTILE_CONT(value1, 0.9) OVER w AS p90_value2
, PERCENTILE_CONT(value1, 0.9) OVER w AS p90_value3
FROM my_table
WINDOW w AS (PARTITION BY id, category);
CodePudding user response:
You can use approx_quantiles
aggregation function.
SELECT
id,
category,
approx_quantiles(value1, 10)[offset(5)] as p50,
approx_quantiles(value1, 10)[offset(9)] as p90
FROM my_table
GROUP BY 1,2