here's my sql query:
SELECT COUNT(*) AS member_count
FROM (SELECT DISTINCT p.patient_id,
intrv_type,
service_date
FROM intervention_service AS p
JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
WHERE cohort_id = "202"
AND p.intrv_type IN (4088)
AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"
) AS x;
//235 count
SELECT COUNT(*) AS member_count
FROM ( SELECT DISTINCT p.patient_id,
intrv_type
FROM intervention_service AS p
JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
WHERE cohort_id = "202"
AND p.intrv_type IN (4088)
AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"
) AS x;
//232 count
When I add the service date field and when I do not, I am getting a different count. Can someone please explain why this is happening.
CodePudding user response:
when you SELECT DISTINCT
, it will get the unique values for all the columns you select (see https://www.w3resource.com/sql/select-statement/queries-with-distinct-multiple-columns.php)
So you probable have rows that have the same patient_id
and intrv_type
but different service_date
CodePudding user response:
Take a look at https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html
These 2 queries are equal
SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const;
SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3;
So, when you include service_date
in DISTINCT
clause as SELECT DISTINCT p.patient_id, intrv_type, service_date ...
you make the rows to be grouped by all 3 column values. That is all rows with same patient_id
, intrv_type
and service_date
are treated as one row. So you get as much rows as different 3 values can be.
But when you do not include it as SELECT DISTINCT p.patient_id, intrv_type ...
make them to be grouped only by 2 column values. So you get as much rows as different patient_id
with intrv_type
exists.
CodePudding user response:
As ussu already answered, it will probably be a duplicated Entry getting filtered out by the distinct keyword between the joined tables. Try to execute the subquery on its own so you can have a look at what's different with the data in the table:
SELECT DISTINCT p.patient_id,
intrv_type
FROM intervention_service AS p
JOIN map_cohort_patient AS m ON p.patient_id = m.patient_id
WHERE cohort_id = "202"
AND p.intrv_type IN (4088)
AND DATE(service_date) BETWEEN "2020-07-01" AND "2021-07-31"