Home > Net >  GROUP BY returns wrong count in SnowFlake
GROUP BY returns wrong count in SnowFlake

Time:12-30

Below is a row from a SnowFlake query. This is the only row in the result with this information (i.e., this row is unique).

ID   ACCOUNT_NUMBER  DATE_1       DATE_2
123  347             2017-10-19   2017-10-29

I ran a GROUP BY like below to count the number of rows in each group. I got 3 for the above row. Shouldn't I get 1?

SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*) 
FROM TABLE GROUP BY 1, 2, 3, 4;
ID   ACCOUNT_NUMBER  DATE_1       DATE_2       COUNT
123  347             2017-10-19   2017-10-29   3

I expected to see count of 1 for this row, but I got 3.

CodePudding user response:

The result is correct. The DISTINCT is applied after the grouping and has not effect in provided query.

enter image description here

To apply DISTINCT it should be provided before grouping(subquery)

SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*) 
FROM (SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2 FROM TAB)
GROUP BY 1, 2, 3, 4;

or as a part of aggregate function:

SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, 
       COUNT(DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2) 
FROM TAB
GROUP BY 1, 2, 3, 4;

enter image description here

For sample data:

CREATE OR REPLACE TABLE TAB(ID INT,
                           ACCOUNT_NUMBER INT,
                           DATE_1 TEXT,
                           DATE_2 TEXT)
AS
SELECT 123,  347, '2017-10-19', '2017-10-29' UNION ALL 
SELECT 123,  347, '2017-10-19', '2017-10-29' UNION ALL 
SELECT 123,  347, '2017-10-19', '2017-10-29';
  • Related