I have the following query:
select distinct ProdQty, JobCompletionDate, JobHead.JobNum from erp.JobHead
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'
It returns the ProdQty
for every JobNum
for a given JobCompletionDate
Here is an excerpt of that result:
Prod Qty | JobCompletionDate | JobNum |
---|---|---|
12 | 2022-01-04 | 198583 |
1 | 2022-01-04 | 205388 |
2 | 2022-01-04 | 205562 |
I'm not going to paste the whole table here, but I hope the idea is clear. The reason I do distinct
in the selection is because there are usually duplicate entries for the ProdQty
, and using distinct
eliminates those.
Next, I need to GROUP BY
JobCompletionDate
to get the total of ProdQty
for that date. I am having issues using SUM()
on the ProdQty
field. When I sum the values from this column using Excel or a calculator, I get the value 7201, which is the correct value that I need.
However, when I perform this query:
select sum(distinct ProdQty) from erp.JobHead
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'
group by JobCompletionDate
My result is: 6660
Why? What am I doing incorrectly here?
CodePudding user response:
This:
select distinct ProdQty, JobCompletionDate, JobHead.JobNum...
is deciding what is distinct very differently than this:
select sum(distinct ProdQty)...
In the first example, it's removing duplicates where the combination of ProdQty, JobCompletionDate, and JobHead.JobNum are not distinct.
In the second example, it's removing duplicates where just ProdQty is not distinct. Since it's removing too many rows with the distinct as it's specified, the sum comes out lower than expected.
What you want to do is deduplicate the rows the way you know it's working (assuming that's the first query) and then perform the sum off of that intermediate result. You could use a CTE or subselect to do that. Here's an example:
select sum(ProdQty) from
(
select distinct ProdQty, JobCompletionDate, JobHead.JobNum from erp.JobHead
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'
);