Home > Software design >  How to do count(distinct) for multiple columns
How to do count(distinct) for multiple columns

Time:11-05

This does not work:

select count(distinct colA, colB) from mytable

I know I can simply solve this by making a double select.

select count(*) from (
    select distinct colA, colB from mytable
)

Is there anyway I can do this without having to do the sub-select?

CodePudding user response:

Subquery is standard solution which I recommend too. Concatenation-based solutions, except they are error-prone if dangerous character occurs, might be also worse in performance.

Note: in case you collected obscure solutions how to avoid subquery, window function usage is also possible here (Not to be used in production - your code reviewers won't praise you for it):

select distinct count(*) over ()
from my_table
group by colA, colB

CodePudding user response:

[TL;DR] Just use a sub-query.


If you are trying to use concatenation then you need to ensure that you delimit the terms with a string that is never going to appear in the values otherwise you will find non-distinct terms grouped together.

For example: if you have a two numeric column then using COUNT(DISTINCT col1 || col2) will group together 1||23 and 12||3 and count them as one group.

You could use COUNT(DISTINCT col1 || '-' || col2) but if the columns are string values and you have 'ab-'||'-'||'c' and 'ab'||'-'||'-c' then, once again, they would be identical once concatenated.

The simplest method is to use a sub-query.

If you can't do that then you can combine columns via string-concatenation but you need to analyse the contents of the column and pick a delimiter that does not appear in your strings otherwise your results might be erroneous. Even better is to ensure that the delimiter character will never be in the sub-string with check constraints.

ALTER TABLE mytable ADD CONSTRAINT mytable__col1__chk CHECK (col1 NOT LIKE '%¬%');
ALTER TABLE mytable ADD CONSTRAINT mytable__col2__chk CHECK (col2 NOT LIKE '%¬%');

Then:

SELECT COUNT(DISTINCT col1 || '¬' || col2)
FROM   mytable;

CodePudding user response:

Just for fun, you can (ab)use window functions and limit clause. These are evaluated after grouping. So:

SELECT COUNT(*) OVER()
FROM t
GROUP BY col_a, col_b
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY

CodePudding user response:

If you're trying to avoid sub-selects at all costs, one variant would be to concatenate them as such:

SELECT count(DISTINCT concat(colA, colB)) FROM mytable;

CodePudding user response:

Concatenate them.

Select count(distinct colA ||'-'|| colB) from mytable;
  • Related