suppose we have the table below to track the shipping department's machine. It typically takes 5 mins for the machine to scan certain numbers of packages. However, it might have some problems during the scan process. Therefore, we would like to track each of the scan event durations? (Please refer to the below table)
date | eventTime | event | actions |
---|---|---|---|
2020-02-01 | 2020-02-01 10:05:25 | scan_package | start |
2020-02-01 | 2020-02-01 10:10:25 | scan_package | end |
2020-02-01 | 2020-02-01 10:25:25 | scan_package | start |
2020-02-01 | 2020-02-01 10:30:25 | error | |
2020-02-01 | 2020-02-01 10:35:25 | scan_package | start |
2020-02-01 | 2020-02-01 10:40:25 | scan_package | end |
2020-02-01 | 2020-02-01 10:42:25 | scan_package | start |
2020-02-01 | 2020-02-01 10:47:25 | scan_package | end |
2020-02-01 | 2020-02-01 10:51:25 | scan_package | start |
2020-02-01 | scan_package | start | |
2020-02-02 | 2020-02-02 08:45:25 | scan_package | start |
The scan duration is equivalent: (event = scan_package & action = end) - (event = scan_package & action = start)
However, there are some corner cases that need to handle.
- for row3 (eventTime:2020-02-01_10:25:25) from the table, there is no corresponding end scan event, due to within 5 mins after the scan started, there is no scan ended. Therefore, we will not have the scan duration for this row, but we still need to record the event to the final table. (not sure how to record in this case)
Basically, in order to consider a valid scan duration, the start and end should be within 6 mins.
will not consider the scan event is valid if there is no eventTime like (last second row from the table), we will just ignore the scan event.
assume there are no situations like (actions = start, start, start, then end ...) which means there will not be multiple start actions happening at the same time or in sequence.
assume the column eventTime is timestamp type
Sample desired final output table:
date | eventTime1 | event1 | eventTime2 | event2 | scan_duration_in_mins |
---|---|---|---|---|---|
2020-02-01 | 2020-02-01 10:05:25 | scan_start | 2020-02-01 10:10:25 | scan_end | 5 |
2020-02-01 | 2020-02-01 10:10:25 | scan_start | null | null | null |
2020-02-01 | 2020-02-01 10:35:25 | scan_start | 2020-02-01 10:40:25 | scan_end | 5 |
2020-02-01 | 2020-02-01 10:42:25 | scan_start | 2020-02-01 10:47:25 | scan_end | 5 |
2020-02-01 | 2020-02-01 10:51:25 | scan_start | null | null | null |
Below is some of my code,
CREATE TABLE scan (
date DATE,
eventTime datetime,
event VARCHAR(100),
actions VARCHAR(100)
);
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:05:25', 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:10:25', 'scan_package' , 'end');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:25:25', 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:30:25', 'error' , null );
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:35:25', 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:40:25', 'scan_package' , 'end');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:42:25', 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:47:25', 'scan_package' , 'end');
INSERT INTO scan VALUES ( '2020-02-01' , '2020-02-01 10:51:25', 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-01' , null , 'scan_package' , 'start');
INSERT INTO scan VALUES ( '2020-02-02' , '2020-02-02 08:45:25', 'scan_package' , 'start');
SELECT
t1.date as date,
t1.eventTime as scan_start_time,
t1.event as scan_start,
t2.eventTime as scan_end_time,
t2.event as scan_end
FROM scan as t1
LEFT JOIN scan as t2 ON t2.eventTime BETWEEN t1.eventTime INTERVAL 4 MINUTE
AND t1.eventTime INTERVAL 6 MINUTE
AND t1.actions = 'start'
AND t2.actions <> 'start';
However, the code above did not get the desired output. Can someone please provide me with some suggestions or SQL code? Thanks a lot
CodePudding user response:
in order to consider a valid scan duration, the start and end should be within 6 mins.
SELECT *, TIMESTAMPDIFF(MINUTE, t1.eventTime, t2.eventTime) duration
FROM scan t1
LEFT JOIN scan t2 ON t2.eventTime BETWEEN t1.eventTime INTERVAL 4 MINUTE
AND t1.eventTime INTERVAL 6 MINUTE
AND t2.actions <> 'start'
WHERE t1.actions = 'start'
AND t1.eventTime IS NOT NULL;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ab9a56a4f12d2462139ecba24eaacbdf