Home > Net >  Remove duplicate before group by SQL
Remove duplicate before group by SQL

Time:06-06

I need remove duplicates in SELECT before count them. I need to count how many test have been done. A patient can't get same value for the same test. I had tried DISTINCT but it would be a GROUP BY(DISTINCT) instead of GROUP BY(GROUP BY). It is pretty much the same but DISTINCT will remove other columns I need in patient_test table. I used a view as a good practice, but I'm not sure that is helping.

TLDR: I want to improve a nested GROUP BY, but I can't find a way.

table: patient_test (12 rows)

Patient | Test      | Value
--------------------------------
Laura   | Test A    |  1.3
Laura   | Test B    |  8.9 <-          
Laura   | Test B    |  8.9 <-  
Laura   | Test C    |  8.9           
Charles | Test B    |  8.9           
Charles | Test B    |  4.6           
Charles | Test C    |  5.6           
Maria   | Test A    |  1.2 <--
Maria   | Test A    |  1.2 <--
Maria   | Test B    |  1.2
Maria   | Test B    |  8.9
Maria   | Test C    |  1.2

I exclude duplicates this way

SELECT * FROM patient_test 
GROUP BY Patient , Test, Value 

to get:

table: patient_test (10 rows)

Patient | Test      | Value
--------------------------------
Laura   | Test A    |  1.3
Laura   | Test B    |  8.9         
Laura   | Test C    |  8.9           
Charles | Test B    |  8.9           
Charles | Test B    |  4.6           
Charles | Test C    |  5.6           
Maria   | Test A    |  1.2  
Maria   | Test B    |  1.2
Maria   | Test B    |  8.9
Maria   | Test C    |  1.2

Then I count them this way: (What I want to improve)

SELECT TestName, count(*) AS Result FROM 
(SELECT * FROM patient_test 
GROUP BY Patient , Test, Value) AS tempPatient
GROUP BY tempPatient.TestName;

I'm using a view, but I think that exclude duplicates and them GROUP BY right away could be better.

SELECT TestName, count(*) AS Result FROM 
(CustomView) AS tempPatient
GROUP BY tempPatient.TestName;

This is what I need:

TestName  | Result
----------------------
Test A    |  2
Test B    |  5
Test C    |  3 

I think that is not necessary to run GROUP BY twice. I would like to find a way to exclude duplicates using the examples. Any help or suggestion would be nice.

CodePudding user response:

Seems fine what you have done. I would do something very similar:

SELECT 
    T.Test,
    COUNT(*) AS Result 
FROM 
    (
        SELECT DISTINCT 
            Patient,
            Test,
            Value
        FROM 
            patient_test
        ) T
GROUP BY
    T.Test
;

If you have duplicates in the table, you have to use grouping or distinct to remove them - that is pretty much standard operating procedure in a case like this.

CodePudding user response:

On MySQL 8 you can use row_number , the partition by patient,test,value will create unique values. If the same values exists row_num will be >1.

with cte as 
( select *, row_number () over(partition by patient,test,value ) as row_num
  from patient_test
) select * 
  from cte;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=089755b482817991b06e58a5caa675e9

To count only unique values add where row_num=1 condition

with cte as 
( select *, row_number () over(partition by patient,test,value ) as row_num
  from patient_test
) select test,count(value) as nr_count
  from cte
  where row_num=1
  group by test
  order by test asc;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e1841a84eb872d1c047c2c969d80a002

CodePudding user response:

Unlike other databases MySql allows more than one columns referenced with DISTINCT inside COUNT():

SELECT Test, COUNT(DISTINCT Patient, Value) AS Result
FROM patient_test 
GROUP BY Test;

See the demo.

  • Related