I have a table that looks like this. And I want to get the distinct count across the three columns.
ID | Column1 | Column 2 | Column 3 |
---|---|---|---|
1 | A | B | C |
2 | A | A | B |
3 | A | A |
The desired output I'm looking for is:
ID | Column1 | Column 2 | Column 3 | unique_count |
---|---|---|---|---|
1 | A | B | C | 3 |
2 | A | A | B | 2 |
3 | A | A | 1 |
CodePudding user response:
1
case when C1 not in (C2, C3) then 1 else 0 end
case when C2 not in (C3) then 1 else 0 end
This will not work if you intend to count nulls. The pattern would extend to more columns by successively comparing each one to all columns to its right. The order doesn't strictly matter. There's just no point in repeating the same test over and over.
If they were alphabetically ordered them you could test only adjacent pairs to look for differences. While that applies to your limited sample it would not be the most general case.
CodePudding user response:
One possible option would be
WITH sample AS (
SELECT 'A' Column1, 'B' Column2, 'C' Column3 UNION ALL
SELECT 'A', 'A', 'B' UNION ALL
SELECT 'A', 'A', NULL
)
SELECT Column1, Column2, Column3, COUNT(DISTINCT c) unique_count
FROM (SELECT *, ROW_NUMBER() OVER () rn FROM sample) t LATERAL VIEW EXPLODE(ARRAY(Column1, Column2, Column3)) tf AS c
GROUP BY Column1, Column2, Column3, rn;
output
--------- --------- --------- --------------
| column1 | column2 | column3 | unique_count |
--------- --------- --------- --------------
| A | A | NULL | 1 |
| A | A | B | 2 |
| A | B | C | 3 |
--------- --------- --------- --------------