I have a table mydata
in a Postgres 14 database, with the following relevant columns:
ftype
: anenum
havingfoo
,bar
andbaz
status
: anotherenum
havingpending
,failed
,success
I want the success rate of different types. Success rate is basically: the number of rows where the status
is success
divided by total number of rows for that ftype
.
Currently, I'm doing the following:
SELECT
COALESCE(
COUNT(CASE WHEN ftype = 'foo' AND status = 'success' THEN 1 END) /
NULLIF(COUNT(CASE WHEN ftype = 'foo' THEN 1 END)::real, 0)
,0)::real AS foo_rate,
COALESCE(
COUNT(CASE WHEN ftype = 'bar' AND status = 'success' THEN 1 END) /
NULLIF(COUNT(CASE WHEN ftype = 'bar' THEN 1 END)::real, 0)
,0)::real AS bar_rate,
COALESCE(
COUNT(CASE WHEN ftype = 'baz' AND status = 'success' THEN 1 END) /
NULLIF(COUNT(CASE WHEN ftype = 'baz' THEN 1 END)::real, 0)
,0)::real AS baz_rate,
FROM mydata;
Is there a better/more performant way? How can I optimize it?
Would using PARTITION
in the query help?
CodePudding user response:
Since you are computing rates for all types, a pivoted result would be simpler:
SELECT ftype, CASE WHEN ct_success = 0 THEN 0 ELSE ct_success/ct_total END AS rate
FROM (
SELECT ftype
, count(*) AS ct_total
, count(*) FILTER (WHERE status = 'success')::real AS ct_success
FROM mydata
GROUP BY 1
) sub;
Works for a selection of types, too. (The benefit is smaller, though.) Filter early:
SELECT ftype, CASE WHEN ct_success = 0 THEN 0 ELSE ct_success/ct_total END AS rate
FROM (
SELECT ftype
, count(*) AS ct_total
, count(*) FILTER (WHERE status = 'success')::real AS ct_success
FROM mydata
WHERE ftype = ANY ('{foo,bar,baz}'::my_enum_type[]) -- here!
GROUP BY 1
) sub;
About the WHERE
clause:
You may want to round()
and/or use numeric
to begin with. See:
About the aggregate FILTER
clause:
Whether table partitioning might help is hard to tell from the minimal information we have. There are useful guidelines in the manual chapter "Overview" for "Table Partitioning". My educated guess is that it wouldn't benefit you.