I am getting two different counts in sql query-CodePudding

here's my sql query:

SELECT COUNT(*) AS member_count
FROM (SELECT DISTINCT p.patient_id, 
                      intrv_type, 
                      service_date
      FROM intervention_service AS p 
      JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
      WHERE cohort_id = "202"
      AND p.intrv_type IN (4088)
      AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31" 
      )  AS x;

//235 count

SELECT COUNT(*) AS member_count 
FROM ( SELECT DISTINCT p.patient_id, 
                       intrv_type
       FROM intervention_service AS p 
       JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
       WHERE cohort_id = "202"
       AND p.intrv_type IN (4088)
       AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"
     ) AS x;

//232 count

When I add the service date field and when I do not, I am getting a different count. Can someone please explain why this is happening.

CodePudding user response：

when you SELECT DISTINCT, it will get the unique values for all the columns you select (see https://www.w3resource.com/sql/select-statement/queries-with-distinct-multiple-columns.php)

So you probable have rows that have the same patient_id and intrv_type but different service_date

CodePudding user response：

Take a look at https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html

These 2 queries are equal

SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const;

SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3;

So, when you include service_date in DISTINCT clause as SELECT DISTINCT p.patient_id, intrv_type, service_date ... you make the rows to be grouped by all 3 column values. That is all rows with same patient_id, intrv_type and service_date are treated as one row. So you get as much rows as different 3 values can be.

But when you do not include it as SELECT DISTINCT p.patient_id, intrv_type ... make them to be grouped only by 2 column values. So you get as much rows as different patient_id with intrv_type exists.

CodePudding user response：

As ussu already answered, it will probably be a duplicated Entry getting filtered out by the distinct keyword between the joined tables. Try to execute the subquery on its own so you can have a look at what's different with the data in the table:

SELECT DISTINCT p.patient_id, 
                       intrv_type
       FROM intervention_service AS p 
       JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
       WHERE cohort_id = "202"
       AND p.intrv_type IN (4088)
       AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"