Home > Blockchain >  How to merge consecutive column values in Oracle into one and save initial order?
How to merge consecutive column values in Oracle into one and save initial order?

Time:05-31

I have the following table.


Table 1

 --------- ----------- 
| col_key | col_value |
 --------- ----------- 
| key1    | value1    |
| key1    | value2    |
| key1    | value3    |
| key1    | value2    |
| key1    | value2    |
| key1    | value2    |
| key1    | value1    |
| key1    | value1    |
| key1    | value3    |
| key1    | value3    |
| key1    | value3    |
| key1    | value2    |
 --------- ----------- 

I want to get the following output:


Table 2

 --------- ------------------------------------------------------------------- 
| col_key | col_value                                                         |
 --------- ------------------------------------------------------------------- 
| key1    | value1 | value2 | value 3 | value 2 | value 1 | value 3 | value 2 |
 --------- ------------------------------------------------------------------- 

Algorithm - if consecutive values same - merge them into one.

I am using Oracle and listagg function:


select 
  col_key, 
  listagg(distinct col_value, ' | ') within group (order by col_key) as col_value
from
  sample_table
group by col_key
order by col_key

But, listagg function with distinct keyword removes duplicates.

So, is it possible to make this in Oracle (like in table 2)?

Oracle versions (12c and 18c)

CodePudding user response:

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row comparisons and aggregate the adjacent duplicates and then you can use LISTAGG to aggregate the unique values:

SELECT col_key,
       LISTAGG(value, ' | ') WITHIN GROUP (ORDER BY mno) AS col_value
FROM   (SELECT t.*,
               ROWNUM AS rn -- You need to provide a way of getting this order!
        FROM table1 t)
MATCH_RECOGNIZE(
  PARTITION BY col_key
  ORDER     BY rn
  MEASURES
    MATCH_NUMBER()   AS mno,
    FIRST(col_value) AS value
  PATTERN (same_value )
  DEFINE
    same_value AS FIRST(col_value) = col_value
)
GROUP BY col_key;

Which, for the sample data:

CREATE TABLE Table1 (col_key, col_value) AS
SELECT 'key1', 'value1' FROM DUAL UNION ALL
SELECT 'key1', 'value2' FROM DUAL UNION ALL
SELECT 'key1', 'value3' FROM DUAL UNION ALL
SELECT 'key1', 'value2' FROM DUAL UNION ALL
SELECT 'key1', 'value2' FROM DUAL UNION ALL
SELECT 'key1', 'value2' FROM DUAL UNION ALL
SELECT 'key1', 'value1' FROM DUAL UNION ALL
SELECT 'key1', 'value1' FROM DUAL UNION ALL
SELECT 'key1', 'value3' FROM DUAL UNION ALL
SELECT 'key1', 'value3' FROM DUAL UNION ALL
SELECT 'key1', 'value3' FROM DUAL UNION ALL
SELECT 'key1', 'value2' FROM DUAL;

Outputs:

COL_KEY COL_VALUE
key1 value1 | value2 | value3 | value2 | value1 | value3 | value2

db<>fiddle here

CodePudding user response:

You said you'll edit the question and provide sorting column; so far, you didn't do that so I used ROWID instead. Change it with your own column (once you find out which one to use).

Read comments within code.

SQL> with
  2  temp as
  3    -- find COL_VALUE and the value that follows it (NEXT_VALUE in this query.
  4    -- As I already said, I used ROWID for sorting purposes)
  5    (select col_key,
  6       col_value,
  7       rowid rid,
  8       lead(col_value) over (partition by col_key order by rowid) next_value
  9     from sample_table
 10    ),
 11  temp2 as
 12    -- "new" COL_VALUE will be the "original" COL_VALUE if it is different from its
 13    -- next value (or - for the last row - if there's no next value)
 14    (select col_key,
 15       rid,
 16       case when col_value <> next_value or next_value is null then col_value
 17            else null
 18       end col_value
 19     from temp
 20    )
 21  -- finally, aggregate the result
 22  select col_key,
 23    listagg(col_value, ' | ') within group (order by rid) col_value
 24  from temp2
 25  group by col_key;

   COL_KEY COL_VALUE
---------- ------------------------------------------------------------
         1 val1 | val2 | val3 | val2 | val1 | val3 | val2

SQL>

CodePudding user response:

Here's another take on it:

WITH
 cte1 AS (SELECT COL_KEY, 
                 COL_VALUE, 
                 LAG(COL_VALUE) OVER (ORDER BY ROWNUM) AS PREV_COL_VALUE
            FROM TEST_TAB),
 cte2 AS (SELECT COL_KEY,
                 COL_VALUE,
                 CASE
                   WHEN COL_VALUE = PREV_COL_VALUE THEN 0
                   ELSE 1
                 END AS FLAG
            FROM cte1),
 cte3 AS (SELECT COL_KEY,
                 COL_VALUE
            FROM cte2
            WHERE FLAG = 1)
SELECT COL_KEY,
       LISTAGG(COL_VALUE, ' | ') WITHIN GROUP (ORDER BY COL_KEY, ROWNUM) AS COL_VALUES
  FROM cte3
  GROUP BY COL_KEY

This produces:

COL_KEY  COL_VALUES
-------  ------------------------------------------------------------
key1     value1 | value2 | value3 | value2 | value1 | value3 | value2

which matches your desired output, but without an ordering column in your data you're at the mercy of the database and how it chooses to return the data. Remember - a basic rule of relational databases is that tables are unordered collections of rows, and there must an ORDER BY clause to impose an order on those rows when they are returned by a query.

db<>fiddle here

  • Related