Home > Software design >  I am getting two different counts in sql query
I am getting two different counts in sql query

Time:07-15

here's my sql query:

SELECT COUNT(*) AS member_count
FROM (SELECT DISTINCT p.patient_id, 
                      intrv_type, 
                      service_date
      FROM intervention_service AS p 
      JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
      WHERE cohort_id = "202"
      AND p.intrv_type IN (4088)
      AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31" 
      )  AS x;

//235 count

SELECT COUNT(*) AS member_count 
FROM ( SELECT DISTINCT p.patient_id, 
                       intrv_type
       FROM intervention_service AS p 
       JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
       WHERE cohort_id = "202"
       AND p.intrv_type IN (4088)
       AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"
     ) AS x;

//232 count

When I add the service date field and when I do not, I am getting a different count. Can someone please explain why this is happening.

CodePudding user response:

when you SELECT DISTINCT, it will get the unique values for all the columns you select (see https://www.w3resource.com/sql/select-statement/queries-with-distinct-multiple-columns.php)

So you probable have rows that have the same patient_id and intrv_type but different service_date

CodePudding user response:

Take a look at https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html

These 2 queries are equal

SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const;

SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3;

So, when you include service_date in DISTINCT clause as SELECT DISTINCT p.patient_id, intrv_type, service_date ... you make the rows to be grouped by all 3 column values. That is all rows with same patient_id, intrv_type and service_date are treated as one row. So you get as much rows as different 3 values can be.

But when you do not include it as SELECT DISTINCT p.patient_id, intrv_type ... make them to be grouped only by 2 column values. So you get as much rows as different patient_id with intrv_type exists.

CodePudding user response:

As ussu already answered, it will probably be a duplicated Entry getting filtered out by the distinct keyword between the joined tables. Try to execute the subquery on its own so you can have a look at what's different with the data in the table:

SELECT DISTINCT p.patient_id, 
                       intrv_type
       FROM intervention_service AS p 
       JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
       WHERE cohort_id = "202"
       AND p.intrv_type IN (4088)
       AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"
  • Related