Having issues getting the SUM() of a column with GROUP BY and DISTINCT-CodePudding

I have the following query:

select distinct ProdQty, JobCompletionDate, JobHead.JobNum from erp.JobHead 
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'

It returns the ProdQty for every JobNum for a given JobCompletionDate Here is an excerpt of that result:

Prod Qty	JobCompletionDate	JobNum
12	2022-01-04	198583
1	2022-01-04	205388
2	2022-01-04	205562

I'm not going to paste the whole table here, but I hope the idea is clear. The reason I do distinct in the selection is because there are usually duplicate entries for the ProdQty, and using distinct eliminates those.

Next, I need to GROUP BY JobCompletionDate to get the total of ProdQty for that date. I am having issues using SUM() on the ProdQty field. When I sum the values from this column using Excel or a calculator, I get the value 7201, which is the correct value that I need.

However, when I perform this query:

select sum(distinct ProdQty) from erp.JobHead 
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'
group by JobCompletionDate

My result is: 6660

Why? What am I doing incorrectly here?

CodePudding user response：

This:

select distinct ProdQty, JobCompletionDate, JobHead.JobNum...

is deciding what is distinct very differently than this:

select sum(distinct ProdQty)...

In the first example, it's removing duplicates where the combination of ProdQty, JobCompletionDate, and JobHead.JobNum are not distinct.

In the second example, it's removing duplicates where just ProdQty is not distinct. Since it's removing too many rows with the distinct as it's specified, the sum comes out lower than expected.

What you want to do is deduplicate the rows the way you know it's working (assuming that's the first query) and then perform the sum off of that intermediate result. You could use a CTE or subselect to do that. Here's an example:

select sum(ProdQty) from 
(
select distinct ProdQty, JobCompletionDate, JobHead.JobNum from erp.JobHead 
inner join erp.LaborDtl on JobHead.JobNum = LaborDtl.JobNum and JobHead.Company = LaborDtl.Company
where JobCompletionDate = '2022-01-04' and JobHead.Company = 'TD' and LaborDtl.JCDept ='MS'
);