There is a jobs
table,
jobID | state | timestamp |
---|---|---|
1 | inProgress | 2023-11-02 10:39:37 |
1 | some random log | 2023-11-02 10:40:37 |
2 | inProgress | 2023-11-02 10:43:37 |
1 | inProgress | 2023-11-02 10:44:37 |
1 | inProgress | 2023-11-02 10:45:37 |
1 | inProgress | 2023-11-02 10:50:00 |
1 | end | 2023-11-02 11:40:37 |
1 | inProgress | 2023-11-02 11:43:37 |
1 | inProgress | 2023-11-02 11:44:37 |
1 | end | 2023-11-02 11:45:37 |
2 | some random log | 2023-11-02 11:50:37 |
2 | end | 2023-11-02 12:00:37 |
I would like to
- calculate how many jobs are in progress (i.e. has already started but not end yet)
I first thought I could calculate count(state=inProgress)-count(state=end)
to get the number of inProgress jobs but inProgress
could show up not only once.
- get execution duration for each finished job
Note that a job can execute multiple times, exg. jobID1
executed from start to finish twice, therefore two execution durations need to be calculated for jobID1
.
jobID1
execution duration is time diff between 2023-11-02 11:40:37, 2023-11-02 10:39:37
AND time diff between 2023-11-02 11:45:37, 2023-11-02 11:43:37
jobID2
execution duration is time diff between 2023-11-02 12:00:37, 2023-11-02 10:43:37
CodePudding user response:
CREATE TABLE jobs(
jobID INT,
state VARCHAR(50),
timestamp datetime);
INSERT INTO jobs VALUES
(1, 'inProgress', '2023-11-02 10:39:37'),
(1, 'some random log', '2023-11-02 10:40:37'),
(2, 'inProgress', '2023-11-02 10:43:37'),
(1, 'inProgress', '2023-11-02 10:44:37'),
(1, 'inProgress', '2023-11-02 10:45:37'),
(1, 'inProgress', '2023-11-02 10:50:00'),
(1, 'end', '2023-11-02 11:40:37'),
(1, 'inProgress', '2023-11-02 11:43:37'),
(1, 'inProgress', '2023-11-02 11:44:37'),
(1, 'end', '2023-11-02 11:45:37'),
(2, 'some random log', '2023-11-02 11:50:37'),
(2, 'end', '2023-11-02 12:00:37');
Based on the data sample above, here is a suggestion:
WITH end_state AS (
SELECT ROW_NUMBER() OVER (ORDER BY jobID, timestamp) AS Rsnum,
jobID, timestamp AS end_time
FROM jobs
WHERE state='end'),
es_wDelimiter AS (
SELECT e1.*,
IF(e2.end_time < e1.end_time, e2.end_time,NULL) AS esDelimiter
FROM end_state e1
LEFT JOIN end_state e2 ON e1.jobID=e2.jobID AND e1.end_time <> e2.end_time)
SELECT Rsnum,
j.jobID,
MIN(timestamp) AS StartTime,
MAX(end_time) AS EndTime,
TIMEDIFF(MAX(end_time),MIN(timestamp)) AS ExecutionTime
FROM jobs j
LEFT JOIN es_wDelimiter e
ON j.jobID=e.jobID
AND j.timestamp > IFNULL(esDelimiter,0)
AND j.timestamp <= e.end_time
GROUP BY Rsnum, j.jobID
ORDER BY RsNum, j.jobID;
The query operation are as following:
- Get all timestamp with state=end and assign with row number - the row numbers are used as new group.
- Wrap the first query as common table expression and do another query to create a custom delimiter.
- wrap the second query as another common table expression and do a LEFT JOIN query to it with jobs table with these conditions:
jobs.timestamp
value is larger than the delimiter value that been set earlier (if the delimiter value is NULL then it will be assigned with0
).jobs.timestamp
value is smaller or the same as theend_time
value we extracted from the first query.
Here is a fiddle for reference: https://dbfiddle.uk/RBEHLuji
CodePudding user response:
For a job that may have multiple instances the following approach "looks backward" to locate any previous ends for the same jobid, each instance of the jobid is also assigned using row_number()
. Then these rows for "ends" are then joined to any rows representing "In Progress" with conditions to ensure they fit within "previous end" (or NULL) and "this end". Then the minimum start of each job instance is calculated, and the duration (in seconds) also made available.
WITH e AS (
SELECT
id
, jobid
, LAG(TIMESTAMP) OVER (PARTITION BY jobID ORDER BY TIMESTAMP) AS prev_end
, TIMESTAMP AS end_time
, row_number() OVER (PARTITION BY jobid ORDER BY TIMESTAMP) AS job_inst
FROM jobs
WHERE STATE = 'end'
)
SELECT
e.jobid
, e.job_inst
, min(p.start_time) start_time
, e.end_time
, TIMESTAMPDIFF(SECOND, min(p.start_time), e.end_time) AS du_seconds
FROM e
INNER JOIN (
SELECT
id
, jobID
, TIMESTAMP AS start_time
FROM jobs
WHERE STATE = 'inProgress'
) p ON e.jobid = p.jobid
AND (
(p.start_time < e.end_time and e.prev_end IS NULL)
OR
(p.start_time > e.prev_end and p.start_time < e.end_time)
)
GROUP BY
e.jobid
, e.job_inst
, e.end_time
ORDER BY
e.jobid
, e.job_inst
jobid | job_inst | start_time | end_time | du_seconds |
---|---|---|---|---|
1 | 1 | 2023-11-02 10:39:37 | 2023-11-02 11:40:37 | 3660 |
1 | 2 | 2023-11-02 11:43:37 | 2023-11-02 11:45:37 | 120 |
2 | 1 | 2023-11-02 10:43:37 | 2023-11-02 12:00:37 | 4620 |
See fiddle
Once you have the jobs and instances of them available, the min/max start/end and total duration can be easily calculated from the result seen above (optionally the max(job_inst) could also be displayed).
SELECT
jobid
, MAX(job_inst)
, MIN(start_time)
, MAX(end_time)
, SUM(du_seconds)
FROM (
{ result from above }
) d
GROUP BY
jobid
CodePudding user response:
To count the number of jobs that are in progress (started but not yet finished):
SELECT COUNT(DISTINCT jobID) AS inProgressJobs
FROM jobs
WHERE jobID NOT IN (
SELECT jobID
FROM jobs
WHERE state = 'end'
);
To calculate the execution duration for each finished job:
SELECT jobID,
MIN(CASE WHEN state = 'end' THEN timestamp END) AS end_time,
MAX(CASE WHEN state = 'inProgress' THEN timestamp END) AS start_time
FROM jobs
GROUP BY jobID, state
HAVING start_time IS NOT NULL AND end_time IS NOT NULL;