Home > Net >  Get a Count of DISTINCT Rows having non NULL Values in Columns with a GroupBy clause
Get a Count of DISTINCT Rows having non NULL Values in Columns with a GroupBy clause

Time:11-15

Suppose the below table as my raw data

ID Type Date Value1 Value2
1 A 31-Oct-22 NULL 0.5
1 B 31-Oct-22 NULL 0.6
1 C 31-Oct-22 0.8 0.7
1 A 30-Sep-22 0.6 NULL
2 A 31-Oct-22 0.2 NULL
2 C 31-Oct-22 NULL 0.3
2 B 30-Sep-22 NULL NULL
2 D 30-Sep-22 NULL NULL

What I want to do is find the Unique Count of IDs which have NON NULL Values in Value1 and Value2 with GroupBy on Date. So ideally the output would look somewhat like the below

The query output should ideally be as follows

Date Value1 Value2
31-Oct-22 2 2
30-Sep-22 1 0

Explanation on the above values.

For 31-Oct-22 & Value1: Both ID 1 and ID2 had NON NULL multiple entries. The DISTINCT count here thus is 2.

For 30-Sep-22 & Value1: ID 1 had only one entry which was non NULL and ID2 had TWO NULL multiple entries. The DISTINCT count here thus is 1.

For 30-Sep-22 & Value2: Both ID1 and ID2 had only NULL entries. Hence count is 0.

I initially thought about DISTINCT. However I'm not sure how to merge DISTINCT ON a different column and combine it to get a NON NULL count of the columns.

Please help me.

CodePudding user response:

We can do this with two levels of aggregation. In Postgres, we could use boolean aggregate functions:

select date, 
    count(*) filter(where has_value_1) cnt_value1,
    count(*) filter(where has_value_2) cnt_value2
from ( 
    select date, id, 
        bool_or(value1 is not null) has_value_1,
        bool_or(value2 is not null) has_value_2
    from mytable
    group by date, id
) t
group by date
order by date

A more portable way to phrase this would be:

select date, 
    sum(has_value_1) cnt_value1,
    sum(has_value_2) cnt_value2
from ( 
    select date, id, 
        max(case when value1 is not null then 1 else 0 end) has_value_1,
        max(case when value2 is not null then 1 else 0 end) has_value_2
    from mytable
    group by date, id
) t
group by date
order by date

Demo on DB Fiddlde:

date cnt_value1 cnt_value2
2022-09-30 1 0
2022-10-31 2 2
  • Related