How to get exact rows as result set using CTE?-CodePudding

CREATE TABLE test_tab (
    col_name  VARCHAR2(20),
    log_time  TIMESTAMP(6),
    status    VARCHAR2(20)
);

INSERT INTO test_tab VALUES('Engineering','08-06-22 08:09:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Engineering','09-06-22 08:28:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Engineering','09-06-22 08:13:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','07-06-22 4:59:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','07-06-22 6:34:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Commerce','07-06-22 6:49:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','01-06-22 2:15:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Commerce','07-06-22 07:04:16.366000000 PM','UP');
COMMIT;

Tool used: SQL Developer(18c)

This is the dataset wherein

I need to filter based on the status = 'DOWN' I will get 3 rows which will be my master data.
For the above 3 rows I need to check in the same table i.e test_tab with the filter status = 'UP' and find the log time which is just greater than the log time of status DOWN

My Attempt:

WITH a AS(
SELECT  col_name, log_time AS start_time, status
  FROM  test_tab
WHERE status = 'DOWN'),
b AS(
SELECT col_name,log_time AS end_time, status
FROM test_tab
WHERE status = 'UP')
SELECT a.col_name, a.start_time,a.status,
b.end_time,b.status
FROM a
JOIN b ON(a.col_name = b.col_name)
WHERE a.start_time < b.end_time;

But from the above query I am getting duplicate records for col_name = 'Commerce' and giving me the 4 rows in my result but ideally it should be only 3 rows based on the filter DOWN.

Expected Output:

Col_name    start_time                      status      end_time                  status
Engineering 09-06-22 8:13:16.366000000 PM   DOWN    09-06-22 8:28:16.366000000 PM   UP
Commerce    07-06-22 4:59:16.366000000 PM   DOWN    07-06-22 6:34:16.366000000 PM   UP
Commerce    07-06-22 6:49:16.366000000 PM   DOWN    07-06-22 7:04:16.366000000 PM   UP

CodePudding user response：

You are getting everything right but you need to decide which of the two results from the second table you want out in your result ? When you are trying to join the two tables based on the name alone you understand that there will be a lot of data in your result right(CHECK THE 3rd query in my demo)? And then you add a where clause and filter that data some more. But for one data from the cte a is paired two times with the data in the cte b so in my proposition I have decided to select the smallest one. Does this give you what you need ?

WITH a AS(
SELECT  col_name, log_time AS start_time, status
  FROM  test_tab
WHERE status = 'DOWN'),
b AS(
SELECT col_name,log_time AS end_time, status
FROM test_tab
WHERE status = 'UP')
SELECT a.col_name
       , a.start_time
       , a.status
       , MIN(b.end_time)
       , b.status
FROM a
LEFT JOIN b ON a.col_name = b.col_name
WHERE a.start_time < b.end_time
GROUP BY a.col_name
       , a.start_time
       , a.status
       , b.status ;

Here is a demo

CodePudding user response：

Here's one option, which performs a self-join of the test_tab table and uses a subquery which fetches the first "UP" time that is higher than its "DOWN" time.

SQL> select d.col_name, d.log_time start_time, d.status,
  2                     u.log_time end_time  , u.status
  3  from test_tab d join test_tab u on u.col_name = d.col_name
  4  where d.status = 'DOWN'
  5    and u.status = 'UP'
  6    and u.log_time = (select min(u1.log_time)
  7                      from test_tab u1
  8                      where u1.col_name = u.col_name
  9                        and u1.status = u.status
 10                        and u1.log_time > d.log_time
 11                     );

COL_NAME    START_TIME                     STATUS END_TIME                       STATUS
----------- ------------------------------ ------ ------------------------------ ------
Engineering 09-06-22 08:13:16:366000000 pm DOWN   09-06-22 08:28:16:366000000 pm UP
Commerce    07-06-22 04:59:16:366000000 pm DOWN   07-06-22 06:34:16:366000000 pm UP
Commerce    07-06-22 06:49:16:366000000 pm DOWN   07-06-22 07:04:16:366000000 pm UP

SQL>

CodePudding user response：

From Oracle 12, this should be done using MATCH_RECOGNIZE which performs row-by-row processing (and does not require a self-join).

If you want a DOWN and then an immediately following UP then:

SELECT *
FROM   test_tab
MATCH_RECOGNIZE(
  PARTITION BY col_name
  ORDER BY log_time
  MEASURES
    FIRST(down.log_time) AS start_time,
    FIRST(up.log_time) AS end_time
  PATTERN (down up)
  DEFINE
    down AS status = 'DOWN',
    up   AS status = 'UP'
)

If you want a DOWN and then the earliest following UP then:

SELECT *
FROM   test_tab
MATCH_RECOGNIZE(
  PARTITION BY col_name
  ORDER BY log_time
  MEASURES
    FIRST(down.log_time) AS start_time,
    FIRST(up.log_time) AS end_time
  AFTER MATCH SKIP TO NEXT ROW
  PATTERN (down other*? up)
  DEFINE
    down AS status = 'DOWN',
    up   AS status = 'UP'
)

If you want to do it without MATCH_RECOGNIZE then you can use a windowed analytic function (which also does not require a self-join):

SELECT col_name,
       log_time AS start_time,
       end_time
FROM   (
  SELECT t.*,
         MIN(CASE status WHEN 'UP' THEN log_time END) OVER (
           PARTITION BY col_name
           ORDER BY log_time
           ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
         ) AS end_time
  FROM   test_tab t
)
WHERE  status = 'DOWN'

Which, for your sample data, all output:

COL_NAME START_TIME END_TIME

Commerce 2022-06-07 16:59:16.366000 2022-06-07 18:34:16.366000

Commerce 2022-06-07 18:49:16.366000 2022-06-07 19:04:16.366000

Engineering 2022-06-09 20:13:16.366000 2022-06-09 20:28:16.366000

COL_NAME	START_TIME	END_TIME
Commerce	2022-06-07 16:59:16.366000	2022-06-07 18:34:16.366000
Commerce	2022-06-07 18:49:16.366000	2022-06-07 19:04:16.366000
Engineering	2022-06-09 20:13:16.366000	2022-06-09 20:28:16.366000

db<>fiddle here