Home > OS >  How to identify which value appears the most in a SQL table (Snowflake) AND account for ties?
How to identify which value appears the most in a SQL table (Snowflake) AND account for ties?

Time:11-02

Let's say we have data as follows:

ID    tag     data            timestamp
001     A   walter  2021-06-04 09:46:25
005     F   junior  2021-06-05 09:47:25
001     B   junior  2021-06-04 09:47:25
002     C  soprano  2021-06-04 09:48:25
002     C     alto  2021-06-04 09:49:25
001     A    brown  2021-06-04 09:50:25
003     A   cleave  2021-06-04 09:51:25
003     B     land  2021-06-04 09:52:25
004     C   before  2021-06-04 09:53:25
005     H   junior  2021-06-04 09:47:25

I need to know which tag appears the most for each of the ID values. In the instance of a tie, please use the most recent tag for the ID, indicated by timestamp.

Expected Result:

ID   tag
001    A
002    C
003    B
004    C
005    F

CodePudding user response:

Using QUALIFY and RANK to filter the result of grouping:

SELECT ID, tag, COUNT(*) AS cnt, MAX(timestamp) AS max_t
FROM tab
GROUP BY ID, tag
QUALIFY RANK() OVER(PARTITION BY ID ORDER BY cnt DESC, max_t DESC) = 1

Sample data:

CREATE OR REPLACE TABLE tab(ID STRING, tag STRING, data  STRING, timestamp TIMESTAMP)
AS
          SELECT '001',     'A' ,' walter','2021-06-04 09:46:25'
UNION ALL SELECT '005',     'F' ,' junior','2021-06-05 09:47:25'
UNION ALL SELECT '001',     'B' ,' junior','2021-06-04 09:47:25'
UNION ALL SELECT '002',     'C' ,'soprano','2021-06-04 09:48:25'
UNION ALL SELECT '002',     'C' ,'   alto','2021-06-04 09:49:25'
UNION ALL SELECT '001',     'A' ,'  brown','2021-06-04 09:50:25'
UNION ALL SELECT '003',     'A' ,' cleave','2021-06-04 09:51:25'
UNION ALL SELECT '003',     'B' ,'   land','2021-06-04 09:52:25'
UNION ALL SELECT '004',     'C' ,' before','2021-06-04 09:53:25'
UNION ALL SELECT '005',     'H' ,' junior','2021-06-04 09:47:25';

Simplified query:

SELECT ID, tag
FROM tab
GROUP BY ID, tag
QUALIFY RANK() OVER(PARTITION BY ID ORDER BY COUNT(*) DESC, MAX(timestamp) DESC) = 1
ORDER BY ID;

Output:

enter image description here

  • Related