I have two columns that have list of rules and dates like below:
a b c
--------------- -----------------------------
init, rule#062,rule#066 20210417124104,20210417132843,20210419132843 user1
init, rule#062 20210417124104,20210417132843 user2
init 20210417124104 user3
Expected output:
a b c
init 20210417124104 user1
rule#062 20210417124104 user1
rule#066 20210419132843 user1
init 20210417124104 user2
rule#062 20210417124104 user2
init 20210417124104 user3
I need to change one row into the number of items that are there in the column.
the number of items in the list can be different as well not necessary two. I've to run it in exasol db so not all functions work there.
Thanks . Looking for support.
edited
i was able to get result for one single user correct with the answer below, but when i add two users, the answer set multiplies, i think there is some logic with the connect by and level that i dont really understand.
the rows are repeated and how do i start over with every new user
SELECT
u.master_user_id ,
u.user_id ,
SUBSTR(regexp_substr(u.CONS_DATE_HIST, '[^,] ', 1, level), 1, 8) as date_id ,
CASE
WHEN LOWER(trim(regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level))) = 'init'
THEN 'init'
ELSE SUBSTR(trim(regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level)), 6)
END AS rule_nbr,
level lvl
-- row_number() over(partition by master_user_id, level order by user_id) as rn
FROM
(
SELECT client_id,
master_user_id,
user_id,
CONS_DATE_HIST,
CONS_RULES_HIST
FROM
ECOMBI_CL_0001100.users
WHERE
cast(load_date as date) > current_date - 4
and user_id in (38043958 )
) u
connect by regexp_substr(u.CONS_DATE_HIST, '[^,] ', 1, level) <> 'null'
and regexp_substr(u.CONS_RULES_HIST, '[^,] ', 1, level) <> 'null'
-- and user_id<> user_id
-- and user_id <> user_id
-- and row_number() over(partition by master_user_id, level order by user_id) <> 2
order by 2,4
MASTER_USER_ID USER_ID DATE_ID RULE_NBR LVL
37175 38043958 20211024 init 1
37175 38043958 20211024 035 2
37175 38043958 20211024 064 3
37175 38043958 20211025 035 4
37175 38043958 20211025 060 5
with two users:
MASTER_USER_ID USER_ID DATE_ID RULE_NBR LVL
37175 38043958 20211024 035 2
37175 38043958 20211024 035 2
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 035 4
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211025 060 5
37175 38043958 20211024 064 3
37175 38043958 20211024 064 3
37175 38043958 20211024 064 3
37175 38043958 20211024 064 3
37175 38043958 20211024 init 1
968389 38052591 20211024 012 2
968389 38052591 20211024 012 2
968389 38052591 20211024 060 3
968389 38052591 20211024 060 3
968389 38052591 20211024 060 3
968389 38052591 20211024 060 3
968389 38052591 20211024 init 1
any help please?
CodePudding user response:
OK I gave this a try can probbably be inproved upon:
with data(a,b,c) as (
select 'init, rule#062,rule#066' , '20210417124104,20210417132843,20210419132843' , 'user1' from dual union all
select 'init, rule#062' , '20210417124104,20210417132843', 'user2' from dual union all
select 'init' , '20210417124104', 'user3' from dual
)
,at as (
select * from (
select distinct regexp_substr(data.a,'[^,] ', 1, level) a ,c,level lvl from data
connect by regexp_substr(data.a,'[^,] ', 1, level) is not null
))
,bt as (
select * from (
select distinct regexp_substr(data.b,'[^,] ', 1, level) b ,c,level lvl from data
connect by regexp_substr(data.b,'[^,] ', 1, level) is not null
))
select at.a,bt.b,bt.c
from at
join bt on at.lvl = bt.lvl
and at.c = bt.c
order by bt.c,bt.b
;
I noticed when I was done you wanted a solution for ms-sql This is for oracle SQL, Hope it can provide some help anyway...
CodePudding user response:
This is a good use case for UDF (User Defined Functions). We want to create multiple rows from a single row, therefore we need to use an EMITS
udf. If you just want to transform single rows, a SCALAR
udf would be enough. I chose to use Lua, because it has the least overhead, but you can also write the logic in Python, Java, or R. See here for more information on the supported languages.
In the following example, I split the lists on a comma as well as on single space, if you really just want to split on comma use '([^,] )'
as a regex instead.
This example assumes that both lists have the same number of elements.
create schema s;
create or replace table T as values ('init, rule#062,rule#066', '20210417124104,20210417132843,20210419132843' , 'user1'),
('init, rule#062' , '20210417124104,20210417132843', 'user2'),
('init' , '20210417124104', 'user3') as T(a, b, c) ;
--/
CREATE OR REPLACE LUA SCALAR SCRIPT split_lists(first_list varchar(2000000), second_list VARCHAR(2000000))
EMITS(first_list varchar(2000000), second_list varchar(2000000)) AS
function run(ctx)
first_list = ctx.first_list
second_list = ctx.second_list
local first_split = {}
local second_split = {}
for word in string.gmatch(first_list, '([^, ] )') do
first_split[#first_split 1] = word
end
for word in string.gmatch(second_list, '([^, ] )') do
second_split[#second_split 1] = word
end
for i = 1,#first_split do
ctx.emit(first_split[i], second_split[i])
end
end
/
select split_lists(a,b),c from t;
-- Result:
/*
FIRST_LIST SECOND_LIST C
---------- -------------- -----
init 20210417124104 user1
rule#062 20210417132843 user1
rule#066 20210419132843 user1
init 20210417124104 user2
rule#062 20210417132843 user2
init 20210417124104 user3
*/