Input column example :
ID | Column A | Column B | Column C |
---|---|---|---|
1 | cat | cat | dog |
2 | dog | cat | dog |
3 | cat | cat | dog |
4 | bird | cat | dog |
Output column example :
ID | Column A | Column B | Column C | Mode |
---|---|---|---|---|
1 | cat | cat | dog | cat |
2 | dog | cat | dog | dog |
3 | cat | cat | dog | cat |
4 | bird | cat | bird | bird |
So far I have only calculated mode for a single column. Not sure how we can do it horizontally by combining 4 columns.
CodePudding user response:
We can use an unpivot approach with the help of a union query. Then, use ROW_NUMBER()
to select the mode:
WITH cte AS (
SELECT ID, ColumnA AS val FROM yourTable UNION ALL
SELECT ID, ColumnB FROM yourTable UNION ALL
SELECT ID, ColumnC FROM yourTable
),
cte2 AS (
SELECT *, COUNT(*) OVER (PARTITION BY ID, val) cnt
FROM cte
),
cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY cnt DESC, val) rn
FROM cte2
)
SELECT t1.ColumnA, t1.ColumnB, t1.ColumnC, t2.val AS Mode
FROM yourTable t1
INNER JOIN cte3 t2
ON t2.ID = t1.ID
WHERE t2.rn = 1
ORDER BY t1.ID;
In the event that two or more values are tied for the mode, it breaks the tie by arbitrarily returning the alphabetically lower value.
CodePudding user response:
I think you can use the built in MODE function for this, and Snowflake semi-structured functionality to unpivot and apply it. Check the behaviour of MODE suits your needs with regards to breaking ties and null handling etc.
First create test data like your example (please include code to repro in future!)
create view YOURTABLE as
select
ID,
COLUMN_A,
COLUMN_B,
COLUMN_C
from (values
(1,'cat','cat','dog'),
(2,'dog','cat','dog'),
(3,'cat','cat','dog'),
(4,'bird','cat','dog')
) vw (ID, COLUMN_A, COLUMN_B, COLUMN_C);
Here's the query to get your output;
with o_r as (Select ID, COLUMN_A, COLUMN_B, COLUMN_C,
array_construct(COLUMN_A, COLUMN_B, COLUMN_C) arr_row from YOURTABLE )
select ID, COLUMN_A, COLUMN_B, COLUMN_C, mode(value::VARCHAR) MODE
from o_r, lateral flatten (input => o_r.arr_row) lf
group by 1,2,3,4
order by 1;
Gather up the columns we want to calculate MODE over with array_construct(),lateral flatten the array, and then group by your ID, and columns with MODE on the flattened VALUE column.