The data returned as dataset
in the CTE below looks like:
|date | rows_added |
|-----------|------------|
|2022-04-18 | 100 |
|-----------|------------|
|2022-04-17 | 200 |
|-----------|------------|
|2022-04-17 | 600 |
|-----------|------------
How can I incorporate a count of the duplicate records, by date, in the following CTE?
with dataset as (
SELECT
date,
COUNT(*) as rows_added
FROM
my_table
WHERE
date between '2022-01-01 00:00:00'
AND '2022-04-18 00:00:00'
GROUP BY
date
)
SELECT
COUNT(*) as total_days_in_result_set,
COUNT(DISTINCT rows_added) as total_days_w_distinct_record_counts,
COUNT(*) - COUNT(DISTINCT rows_added) as toal_days_w_duplicate_record_counts,
FROM dataset
If I was going to only count the duplicate dates I would use the following but I can't incorporate it into the CTE above:
SELECT date, COUNT(date)
FROM my_table
GROUP BY date
HAVING COUNT(date) >1
Desired output given the example above:
total_days_in_result_set | total_days_w_distinct_record_counts | toal_days_w_duplicate_record_counts | duplicate_dates |
----------------------------------------------------------------------------------------------------------------------------
3 | 3 | 0 | 2
CodePudding user response:
Here's my solution:
WITH DATA AS (with dataset as (
SELECT
CAST(date as date),
COUNT(*) as rows_added
FROM
my_table
WHERE
date between '2022-04-17 00:00:00'
AND '2022-04-18 00:00:00'
GROUP BY
date
)
SELECT
COUNT(*) as total_days_in_result_set,
COUNT(DISTINCT rows_added) as total_days_w_distinct_record_counts,
COUNT(*) - COUNT(DISTINCT rows_added) as toal_days_w_duplicate_record_counts,
CASE WHEN COUNT(date) > 1 THEN 'YES' ELSE 'NO' END AS duplicate_dates
FROM dataset
)
SELECT
total_days_in_result_set,
total_days_w_distinct_record_counts,
toal_days_w_duplicate_record_counts,
COUNT(*) FILTER(WHERE duplicate_dates = 'YES') as count_of_duplicate_dates
FROM DATA
GROUP BY
total_days_in_result_set,
total_days_w_distinct_record_counts,
toal_days_w_duplicate_record_counts
CodePudding user response:
I think if you use the filter
in your final query, you can achieve that:
SELECT
COUNT(*) as total_days_in_result_set,
COUNT(DISTINCT rows_added) as total_days_w_distinct_record_counts,
COUNT(*) - COUNT(DISTINCT rows_added) as toal_days_w_duplicate_record_counts,
count (*) filter (where rows_added > 1) as duplicate_Dates
FROM dataset
The final field called "duplicate dates" is the example.