way to turn column into rows exasol sql-CodePudding

I have two columns that have list of rules and dates like below:

 a                         b                                            c
---------------           -----------------------------     
init, rule#062,rule#066   20210417124104,20210417132843,20210419132843  user1
init, rule#062            20210417124104,20210417132843                 user2
init                      20210417124104                                user3

Expected output:

a            b                c
init        20210417124104    user1
rule#062    20210417124104    user1
rule#066    20210419132843    user1
init        20210417124104    user2
rule#062    20210417124104    user2
init       20210417124104     user3

I need to change one row into the number of items that are there in the column.

the number of items in the list can be different as well not necessary two. I've to run it in exasol db so not all functions work there.

Thanks . Looking for support.

edited

i was able to get result for one single user correct with the answer below, but when i add two users, the answer set multiplies, i think there is some logic with the connect by and level that i dont really understand.

the rows are repeated and how do i start over with every new user

                       SELECT
                                u.master_user_id ,
                                u.user_id ,
                                
                                SUBSTR(regexp_substr(u.CONS_DATE_HIST, '[^,] ', 1, level), 1, 8) as date_id ,
                                
                                CASE
                                        WHEN LOWER(trim(regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level))) = 'init'
                                        THEN 'init'
                                        ELSE SUBSTR(trim(regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level)), 6)
                                END AS rule_nbr,
                                
                                level lvl 
                                
                              --  row_number() over(partition by master_user_id, level order by user_id) as rn
                                FROM
                                (
                                                SELECT client_id,
                                                master_user_id,
                                                user_id,
                                                CONS_DATE_HIST,
                                                CONS_RULES_HIST
                                                FROM
                                                ECOMBI_CL_0001100.users
                                                WHERE
                                                cast(load_date as date) > current_date - 4
                                                and user_id in (38043958 )
                                ) u 
                               
                                connect by   regexp_substr(u.CONS_DATE_HIST,  '[^,] ', 1, level) <> 'null' 
                                and          regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level) <> 'null'
                               -- and          user_id<> user_id
                              --  and             user_id <> user_id
                               -- and             row_number() over(partition by master_user_id, level order by user_id) <> 2
                                
                       order by 2,4

MASTER_USER_ID  USER_ID DATE_ID RULE_NBR    LVL

37175   38043958    20211024    init    1
37175   38043958    20211024    035 2
37175   38043958    20211024    064 3
37175   38043958    20211025    035 4
37175   38043958    20211025    060 5

with two users:

MASTER_USER_ID  USER_ID DATE_ID RULE_NBR    LVL
37175   38043958    20211024    035 2
37175   38043958    20211024    035 2
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    init    1
968389  38052591    20211024    012 2
968389  38052591    20211024    012 2
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    init    1

any help please?

CodePudding user response：

OK I gave this a try can probbably be inproved upon:

with data(a,b,c) as (
select 'init, rule#062,rule#066'  , '20210417124104,20210417132843,20210419132843' , 'user1'  from dual union all
select 'init, rule#062' , '20210417124104,20210417132843', 'user2' from dual union all
select 'init' , '20210417124104', 'user3' from dual 

)
,at as (
select * from (
select distinct regexp_substr(data.a,'[^,] ', 1, level) a ,c,level lvl from data
connect by regexp_substr(data.a,'[^,] ', 1, level) is not null
))
,bt as (
select * from (
select distinct regexp_substr(data.b,'[^,] ', 1, level) b ,c,level lvl from data
connect by regexp_substr(data.b,'[^,] ', 1, level) is not null
))
select at.a,bt.b,bt.c
from at
join bt on at.lvl = bt.lvl
and at.c = bt.c
order by bt.c,bt.b
;

I noticed when I was done you wanted a solution for ms-sql This is for oracle SQL, Hope it can provide some help anyway...

CodePudding user response：

This is a good use case for UDF (User Defined Functions). We want to create multiple rows from a single row, therefore we need to use an EMITS udf. If you just want to transform single rows, a SCALAR udf would be enough. I chose to use Lua, because it has the least overhead, but you can also write the logic in Python, Java, or R. See here for more information on the supported languages.

In the following example, I split the lists on a comma as well as on single space, if you really just want to split on comma use '([^,] )' as a regex instead. This example assumes that both lists have the same number of elements.

create schema s;

create or replace table T as values ('init, rule#062,rule#066', '20210417124104,20210417132843,20210419132843' , 'user1'),
('init, rule#062' , '20210417124104,20210417132843', 'user2'),
('init' , '20210417124104', 'user3') as T(a, b, c) ;


--/
CREATE OR REPLACE LUA SCALAR SCRIPT split_lists(first_list varchar(2000000), second_list VARCHAR(2000000)) 
EMITS(first_list varchar(2000000), second_list varchar(2000000)) AS 
function run(ctx)
 first_list = ctx.first_list
 second_list = ctx.second_list
 
 local first_split = {}
 local second_split = {}
  
 for word in string.gmatch(first_list, '([^, ] )') do
    first_split[#first_split   1] = word
 end
 for word in string.gmatch(second_list, '([^, ] )') do
    second_split[#second_split   1] = word
 end
 
 for i = 1,#first_split do
   ctx.emit(first_split[i], second_split[i])
 end
end
/

select split_lists(a,b),c from t;

-- Result:
/*
FIRST_LIST SECOND_LIST    C     
---------- -------------- ----- 
init       20210417124104 user1 
rule#062   20210417132843 user1 
rule#066   20210419132843 user1 
init       20210417124104 user2 
rule#062   20210417132843 user2 
init       20210417124104 user3 
*/