Suppose the below table as my raw data
ID | Type | Date | Value1 | Value2 |
---|---|---|---|---|
1 | A | 31-Oct-22 | NULL | 0.5 |
1 | B | 31-Oct-22 | NULL | 0.6 |
1 | C | 31-Oct-22 | 0.8 | 0.7 |
1 | A | 30-Sep-22 | 0.6 | NULL |
2 | A | 31-Oct-22 | 0.2 | NULL |
2 | C | 31-Oct-22 | NULL | 0.3 |
2 | B | 30-Sep-22 | NULL | NULL |
2 | D | 30-Sep-22 | NULL | NULL |
What I want to do is find the Unique Count of IDs which have NON NULL Values in Value1 and Value2 with GroupBy on Date. So ideally the output would look somewhat like the below
The query output should ideally be as follows
Date | Value1 | Value2 |
---|---|---|
31-Oct-22 | 2 | 2 |
30-Sep-22 | 1 | 0 |
Explanation on the above values.
For 31-Oct-22 & Value1: Both ID 1 and ID2 had NON NULL multiple entries. The DISTINCT count here thus is 2.
For 30-Sep-22 & Value1: ID 1 had only one entry which was non NULL and ID2 had TWO NULL multiple entries. The DISTINCT count here thus is 1.
For 30-Sep-22 & Value2: Both ID1 and ID2 had only NULL entries. Hence count is 0.
I initially thought about DISTINCT. However I'm not sure how to merge DISTINCT ON a different column and combine it to get a NON NULL count of the columns.
Please help me.
CodePudding user response:
We can do this with two levels of aggregation. In Postgres, we could use boolean aggregate functions:
select date,
count(*) filter(where has_value_1) cnt_value1,
count(*) filter(where has_value_2) cnt_value2
from (
select date, id,
bool_or(value1 is not null) has_value_1,
bool_or(value2 is not null) has_value_2
from mytable
group by date, id
) t
group by date
order by date
A more portable way to phrase this would be:
select date,
sum(has_value_1) cnt_value1,
sum(has_value_2) cnt_value2
from (
select date, id,
max(case when value1 is not null then 1 else 0 end) has_value_1,
max(case when value2 is not null then 1 else 0 end) has_value_2
from mytable
group by date, id
) t
group by date
order by date
date | cnt_value1 | cnt_value2 |
---|---|---|
2022-09-30 | 1 | 0 |
2022-10-31 | 2 | 2 |