Count (field) and Count(distinct field) both outputting wrong counts in SQL-CodePudding

I have 2 tables i am joining, and have created a unique field, a concatenation of 2 string fields. When trying to get a count of that unique field using count(unique_field), I get an incorrect count. My unique_field does include nulls in some of the results, which are being counted correctly. When using count(distinct unique_field) I get 1's and 0's, which is also incorrect. What is a way I can figure this count out??

SELECT DISTINCT        
    CONCAT(product_id,'_', channel_id ) as unique_id,
    count(DISTINCT unique_id) as unique_id_count,
    field c,
    field d
From tablea full outer join tableb
ON tablea.product_id=tableb.product_id
group by 3,4

Example output: column A is unique_Id, Column B is actual count of unique_Id, and column c is the results of unique_id_count

CodePudding user response：

As per my understanding you want to count how many occurrences exist for an individual unique_id.

Therefore, I would suggest to group by unique_id and not by field c and field d.

SELECT DISTINCT        
    CONCAT(product_id,'_', channel_id ) as unique_id,
    count(DISTINCT unique_id) as unique_id_count
From tablea full outer join tableb
ON tablea.product_id=tableb.product_id
group by CONCAT(product_id,'_', channel_id )

CodePudding user response：

so if we "try" make your SQL valid:

with tablea(product_id) as (
    select * from values
    ('a'),
    ('b'),
    ('c'),
    ('d'),
    ('e')
), tableb(product_id, channel_id) as (
    select * from values
    ('a', '1'),
    ('a', '2'),
    ('b', '3')
)
SELECT        
    CONCAT(NVL(a.product_id,b.product_id),'_', b.channel_id ) as unique_id
    ,count(DISTINCT unique_id) as unique_id_count
    --field c,
    --field d
From tablea as a
full outer join tableb as b
    ON a.product_id = b.product_id
group by 1

gives:

UNIQUE_ID	UNIQUE_ID_COUNT
a_1	1
a_2	1
b_3	1
null	0

Which makes sense as in this data there is only "one" thing per grouping. Now I get you had some fields c & d

so maybe like:

with tablea(product_id, field) as (
    select * from values
    ('a', 1),
    ('b', 1),
    ('c', 2),
    ('d', 2),
    ('e', 2)
), tableb(product_id, channel_id, field) as (
    select * from values
    ('a', '1', 10),
    ('a', '2', 10),
    ('a', '2', 10),
    ('b', '3', 10)
)
SELECT        
    CONCAT(NVL(a.product_id,b.product_id),'_', b.channel_id ) as unique_id
    ,count(DISTINCT unique_id) as unique_id_count
    ,a.field as c
    ,b.field as d
From tablea as a
full outer join tableb as b
    ON a.product_id = b.product_id
group by c, d

but this is not valid as unique_id is confusing the SQL, so we can inline that:

SELECT        
    count(DISTINCT CONCAT(NVL(a.product_id,b.product_id),'_', b.channel_id )) as unique_id_count
    ,a.field as c
    ,b.field as d
From tablea as a
full outer join tableb as b
    ON a.product_id = b.product_id
group by c, d

gives:

UNIQUE_ID_COUNT	C	D
3	1	10
0	2	null

But what you are trying to do is somewhat confusing.