Home > Back-end >  How to find mode of multiple columns in Snowflake SQL
How to find mode of multiple columns in Snowflake SQL

Time:01-25

Input column example :

ID Column A Column B Column C
1 cat cat dog
2 dog cat dog
3 cat cat dog
4 bird cat dog

Output column example :

ID Column A Column B Column C Mode
1 cat cat dog cat
2 dog cat dog dog
3 cat cat dog cat
4 bird cat bird bird

So far I have only calculated mode for a single column. Not sure how we can do it horizontally by combining 4 columns.

CodePudding user response:

We can use an unpivot approach with the help of a union query. Then, use ROW_NUMBER() to select the mode:

WITH cte AS (
    SELECT ID, ColumnA AS val FROM yourTable UNION ALL
    SELECT ID, ColumnB FROM yourTable UNION ALL
    SELECT ID, ColumnC FROM yourTable
),
cte2 AS (
    SELECT *, COUNT(*) OVER (PARTITION BY ID, val) cnt
    FROM cte
),
cte3 AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY cnt DESC, val) rn
    FROM cte2
)

SELECT t1.ColumnA, t1.ColumnB, t1.ColumnC, t2.val AS Mode
FROM yourTable t1
INNER JOIN cte3 t2
    ON t2.ID = t1.ID
WHERE t2.rn = 1
ORDER BY t1.ID;

In the event that two or more values are tied for the mode, it breaks the tie by arbitrarily returning the alphabetically lower value.

CodePudding user response:

I think you can use the built in MODE function for this, and Snowflake semi-structured functionality to unpivot and apply it. Check the behaviour of MODE suits your needs with regards to breaking ties and null handling etc.

First create test data like your example (please include code to repro in future!)

create view YOURTABLE as
select
  ID,
  COLUMN_A,
  COLUMN_B,
  COLUMN_C
from  (values
      (1,'cat','cat','dog'),
      (2,'dog','cat','dog'),
      (3,'cat','cat','dog'),
      (4,'bird','cat','dog')
      ) vw (ID, COLUMN_A, COLUMN_B, COLUMN_C);

Here's the query to get your output;

with o_r as (Select ID, COLUMN_A, COLUMN_B, COLUMN_C,
                    array_construct(COLUMN_A, COLUMN_B, COLUMN_C) arr_row from YOURTABLE )
    select ID, COLUMN_A, COLUMN_B, COLUMN_C, mode(value::VARCHAR) MODE  
      from o_r, lateral flatten (input => o_r.arr_row) lf
     group by 1,2,3,4
     order by 1;

Gather up the columns we want to calculate MODE over with array_construct(),lateral flatten the array, and then group by your ID, and columns with MODE on the flattened VALUE column.

  • Related