I would like to get latest row of job which is in failed state, but previous row for the same job should be success. I don't want to get records of job which is continuously failing. I am using MySQL version 5.6.10
Source Table job_status
:
Job_name start_date end_date status
A 8/3/2022 12 PM 8/3/2022 1.30 PM failed
B 8/3/2022 12 PM 8/3/2022 1.00 PM failed
C 8/3/2022 3 PM 8/3/2022 3.10 PM success
B 8/3/2022 3 PM 8/3/2022 3.30 PM failed
C 8/3/2022 3 PM 8/3/2022 3.20 PM success
A 8/3/2022 2 PM 8/3/2022 2.10 PM success
A 8/3/2022 3 PM 8/3/2022 3.20 PM failed
Desired Output:
Job_name start_date end_date status
A 8/3/2022 3 PM 8/3/2022 3.20 PM failed
I am using below query, not sure how to check for previous run.
select distinct(*)
from job_status
where status = 'failed'
order by start_date desc
CodePudding user response:
This won't win any medals as far as performance is concerned but...
select *
from t
where status = 'failed'
and coalesce((
-- previous status
-- coalesce is needed when there is no previous row
select status
from t as x
where x.job_name = t.job_name and x.start_date < t.start_date
order by x.start_date desc
limit 1
), 'success') = 'success'
and not exists (
-- no next record exists
select 1
from t as x
where x.job_name = t.job_name and x.start_date > t.start_date
)
CodePudding user response:
Any solution to the problem is going to depend on some very important assumptions which should have been addressed in the question:
- the state can only progress from failed to success
- by "latest" you mean the row the greatest end_date
- the end_date attribute is a DATETIME type
- the combination of end_date, job_status and job_name is unique
If any of these are not the case then the solution is impossible in SQL (and still very difficult with a procedural language in the absence of other information).
Again this is not an efficient function, although it would benefit from the existence of an index on the combination described in 4 above in the order job_status, job_name, end_date (which would otherwise be a poor choice for an index)....
SELECT c.*
FROM (
SELECT a.job_name, MAX(b.end_date) AS last_fail
FROM job_status a
JOIN job_status b
ON a.job_name=b.job_name
WHERE a.status='success'
AND b.status='failed'
GROUP BY a.job_name
) ilv
JOIN job_status c
ON ilv.job_name=c.job_name
AND ilv.last_fail=c.end_date
Really it's just a variant on the groupwise maximum question which still gets asked about once a week here and has its own chapter in the manual.
Depending on the distribution of the data, and the number of rows, a more efficient solution would probably be to read each row in the table, sorted by job_name, job_status (descending) and end_date then return the entries immediately after each "success" where the job_name matches the preceeding row. That should be possible with something like.....
SELECT job_name, job_status, start_date, end_date
FROM (
SELECT @prev_status AS prev_status, @prev_job_name AS prev_job_name,
a.*,
@prev_status:=a.status AS currstatus,
@prev_job_name:=a.job_name AS currjobname
FROM (
SELECT b.*
FROM job_status b
ORDER BY b.job_name, b.status DESC, d.end_date DESC
) a
) c
WHERE c.prev_status-'success'
AND c.job_status='failed'
AND c.prev_job_name=job_name
CodePudding user response:
Untested but you should be able to express this with a single correlated exists clause:
select *
from job_status s
where s.status='failed'
and exists (
select * from job_status s2
where s2.job_name = s.job_name
and s2.status = 'success'
and s2.end_date < s.end_date
order by s2.end_date desc
limit 1)
order by start_date desc;