Home > database >  How to get exact rows as result set using CTE?
How to get exact rows as result set using CTE?

Time:06-13

CREATE TABLE test_tab (
    col_name  VARCHAR2(20),
    log_time  TIMESTAMP(6),
    status    VARCHAR2(20)
);

INSERT INTO test_tab VALUES('Engineering','08-06-22 08:09:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Engineering','09-06-22 08:28:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Engineering','09-06-22 08:13:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','07-06-22 4:59:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','07-06-22 6:34:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Commerce','07-06-22 6:49:16.366000000 PM','DOWN');
INSERT INTO test_tab VALUES('Commerce','01-06-22 2:15:16.366000000 PM','UP');
INSERT INTO test_tab VALUES('Commerce','07-06-22 07:04:16.366000000 PM','UP');
COMMIT;

Tool used: SQL Developer(18c)

This is the dataset wherein

  1. I need to filter based on the status = 'DOWN' I will get 3 rows which will be my master data.
  2. For the above 3 rows I need to check in the same table i.e test_tab with the filter status = 'UP' and find the log time which is just greater than the log time of status DOWN

My Attempt:

WITH a AS(
SELECT  col_name, log_time AS start_time, status
  FROM  test_tab
WHERE status = 'DOWN'),
b AS(
SELECT col_name,log_time AS end_time, status
FROM test_tab
WHERE status = 'UP')
SELECT a.col_name, a.start_time,a.status,
b.end_time,b.status
FROM a
JOIN b ON(a.col_name = b.col_name)
WHERE a.start_time < b.end_time;

But from the above query I am getting duplicate records for col_name = 'Commerce' and giving me the 4 rows in my result but ideally it should be only 3 rows based on the filter DOWN.

Expected Output:

Col_name    start_time                      status      end_time                  status
Engineering 09-06-22 8:13:16.366000000 PM   DOWN    09-06-22 8:28:16.366000000 PM   UP
Commerce    07-06-22 4:59:16.366000000 PM   DOWN    07-06-22 6:34:16.366000000 PM   UP
Commerce    07-06-22 6:49:16.366000000 PM   DOWN    07-06-22 7:04:16.366000000 PM   UP

CodePudding user response:

You are getting everything right but you need to decide which of the two results from the second table you want out in your result ? When you are trying to join the two tables based on the name alone you understand that there will be a lot of data in your result right(CHECK THE 3rd query in my demo)? And then you add a where clause and filter that data some more. But for one data from the cte a is paired two times with the data in the cte b so in my proposition I have decided to select the smallest one. Does this give you what you need ?

WITH a AS(
SELECT  col_name, log_time AS start_time, status
  FROM  test_tab
WHERE status = 'DOWN'),
b AS(
SELECT col_name,log_time AS end_time, status
FROM test_tab
WHERE status = 'UP')
SELECT a.col_name
       , a.start_time
       , a.status
       , MIN(b.end_time)
       , b.status
FROM a
LEFT JOIN b ON a.col_name = b.col_name
WHERE a.start_time < b.end_time
GROUP BY a.col_name
       , a.start_time
       , a.status
       , b.status ;

Here is a demo

CodePudding user response:

Here's one option, which performs a self-join of the test_tab table and uses a subquery which fetches the first "UP" time that is higher than its "DOWN" time.

SQL> select d.col_name, d.log_time start_time, d.status,
  2                     u.log_time end_time  , u.status
  3  from test_tab d join test_tab u on u.col_name = d.col_name
  4  where d.status = 'DOWN'
  5    and u.status = 'UP'
  6    and u.log_time = (select min(u1.log_time)
  7                      from test_tab u1
  8                      where u1.col_name = u.col_name
  9                        and u1.status = u.status
 10                        and u1.log_time > d.log_time
 11                     );

COL_NAME    START_TIME                     STATUS END_TIME                       STATUS
----------- ------------------------------ ------ ------------------------------ ------
Engineering 09-06-22 08:13:16:366000000 pm DOWN   09-06-22 08:28:16:366000000 pm UP
Commerce    07-06-22 04:59:16:366000000 pm DOWN   07-06-22 06:34:16:366000000 pm UP
Commerce    07-06-22 06:49:16:366000000 pm DOWN   07-06-22 07:04:16:366000000 pm UP

SQL>

CodePudding user response:

From Oracle 12, this should be done using MATCH_RECOGNIZE which performs row-by-row processing (and does not require a self-join).

If you want a DOWN and then an immediately following UP then:

SELECT *
FROM   test_tab
MATCH_RECOGNIZE(
  PARTITION BY col_name
  ORDER BY log_time
  MEASURES
    FIRST(down.log_time) AS start_time,
    FIRST(up.log_time) AS end_time
  PATTERN (down up)
  DEFINE
    down AS status = 'DOWN',
    up   AS status = 'UP'
)

If you want a DOWN and then the earliest following UP then:

SELECT *
FROM   test_tab
MATCH_RECOGNIZE(
  PARTITION BY col_name
  ORDER BY log_time
  MEASURES
    FIRST(down.log_time) AS start_time,
    FIRST(up.log_time) AS end_time
  AFTER MATCH SKIP TO NEXT ROW
  PATTERN (down other*? up)
  DEFINE
    down AS status = 'DOWN',
    up   AS status = 'UP'
)

If you want to do it without MATCH_RECOGNIZE then you can use a windowed analytic function (which also does not require a self-join):

SELECT col_name,
       log_time AS start_time,
       end_time
FROM   (
  SELECT t.*,
         MIN(CASE status WHEN 'UP' THEN log_time END) OVER (
           PARTITION BY col_name
           ORDER BY log_time
           ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
         ) AS end_time
  FROM   test_tab t
)
WHERE  status = 'DOWN'

Which, for your sample data, all output:

COL_NAME START_TIME END_TIME
Commerce 2022-06-07 16:59:16.366000 2022-06-07 18:34:16.366000
Commerce 2022-06-07 18:49:16.366000 2022-06-07 19:04:16.366000
Engineering 2022-06-09 20:13:16.366000 2022-06-09 20:28:16.366000

db<>fiddle here

  • Related