I have a Postgres table that contains a date and status field. I want to create a query that will return the date, plus the total number of records and then the total number of records for each status on that date.
Source Table:
job_id, process_datetime, process_status
The results I would like:
process_date | total_925_jobs | total_completed_925_jobs
2022-01-02 | 50 | 45
2022-01-03 | 150 | 135
I tried to join to subqueries, but it does not like the calculated date field.
SELECT
date(all_records.create_datetime) AS process_date,
total_jobs.total_925_jobs,
total_completed.total_completed_925_jobs
from "925-FilePreprocessing"
all_records
INNER JOIN
( SELECT
date("925-FilePreprocessing".create_datetime) AS total_process_date,
"925-FilePreprocessing".process_status,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
GROUP BY
total_process_date, process_status
) as "total_jobs"
ON date(all_records.create_datetime) = date(total_jobs.total_process_date)
INNER JOIN
(SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT("925-FilePreprocessing".file_preprocessing_id) as total_completed_925_jobs
FROM
"925-FilePreprocessing"
where
"925-FilePreprocessing".create_datetime > '2022-01-01'
and ("925-FilePreprocessing".process_status = 'completed'
or "925-FilePreprocessing".process_status = 'completed-duplicated'
or "925-FilePreprocessing".process_status = 'completed-duplicated-published'
or "925-FilePreprocessing".process_status = 'completed-not_a_drawing'
)
GROUP BY
completed_process_date
) as "total_completed"
ON all_records.process_date = total_completed.completed_process_date
ORDER BY
process_date
I get an error:
ERROR: column all_records.process_date does not exist
LINE 42: ON all_records.process_date = total_completed.completed_pro...
^
CodePudding user response:
Conditional count may be usefull
Old way (using sum) - before Postgresql 9.4
select
a.process_datetime::DATE,
count(*) total_925_jobs,
sum ( case when a.process_status in ('completed',
'completed-duplicated',
'completed-duplicated-published',
'completed-not_a_drawing')
then 1
else 0 end) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
New way - from POstgresql 9.4
(using filter
)
count(*) total_925_jobs,
count(*) filter (where a.process_status in ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')) total_completed_925_jobs
from "925-FilePreprocessing" a
where a.process_datetime::DATE >= '2021-01-01'
group by a.process_datetime::DATE
Going back to your query - I have error column 925-FilePreprocessing.create_datetime does not exist
which is different than yours. Check if table definition you deliver is complete.
CodePudding user response:
the result you like
process_date | total_925_jobs | total_completed_925_jobs 2022-01-02 | 50 | 45 2022-01-03 | 150 | 135
since total_completed have far less row than total_jobs means that there are only two date/datetime greater than '2022-01-01'.
the follow query can be get your result. I declutter a lot unnecessary code.
group by 1 mean: https://www.cybertec-postgresql.com/en/postgresql-group-by-expression/
WITH total_jobs AS (
SELECT
create_datetime::date AS total_process_date,
process_status,
COUNT(file_preprocessing_id) AS total_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'::date
GROUP BY
1,
2
),
total_completed AS (
SELECT
date("925-FilePreprocessing".create_datetime) AS completed_process_date,
COUNT(file_preprocessing_id) AS total_completed_925_jobs
FROM
"925-FilePreprocessing"
WHERE
create_datetime::date > '2022-01-01'
AND process_status IN ('completed', 'completed-duplicated', 'completed-duplicated-published', 'completed-not_a_drawing')
GROUP BY
1
)
SELECT
total_jobs. *,
tp.total_completed_925_jobs
FROM
total_jobs tk
JOIN total_completed tp ON tk.total_process_date = tp.completed_process_date