How to create view by splitting string from column to columns (without using create function) in SQL-CodePudding

I've got a table "products" which consists of two columns: id, raw

Example:

id	raw
132	Description: mangoes; Cost: 15-30; Saving: 64; ER: E 51; EnR: D 56; Eligible: Y$ $ Description: oranges; Cost: 1,000-1,400; Saving: 41; ER: C 75; EnR: D 58; Eligible: Y
497	Description: bananas; Cost: 25; Saving: 9; ER: E 52; EnR: D 56; Eligible: N$ $ Description: apples; Cost: 4,000-14,000; Saving: 165; ER: E 47; EnR: E 53; Eligible: Y$ $ Description: pears; Cost: 800-1,200; Saving: 393; ER: C 73; EnR: D 55; Eligible: Y

This table needs to be converted to the final view like this:

id	Description	Cost	Saving	ER	EnR	Eligible
132	mangoes	15-30	64	E 51	D 56	Y
132	oranges	1,000-1,400	41	C 75	D 58	Y
497	bananas	25	9	E 52	D 56	N
497	apples	4,000-14,000	165	E 47	E 53	Y
497	pears	800-1,200	393	C 73	D 55	Y

I supposed to do that by the next steps:

using SPLIT() by '$ $ ' to split each 'raw' string to rows
using SPLIT() by ';' to split each row from the previous step to separate column and name it.

But I finally confused how to do that.

Could anyone suggest the solution or more elegant way to realize that?

I think the solution is on the surface, but I can't seem to get to it. Thanks.

CodePudding user response：

using SPLIT() by '$ $ ' to split each 'raw' string to rows

Split doesn't split into rows. It splits into array members. To split to rows, use a lateral join to the split_to_table table function:

select ID, split(VALUE, ';')
from t1, table(split_to_table(raw, '$ $'));

You could use string manipulation to form and parse and object, and then pull the columns from it like this:

with X as
(
select ID, parse_json('{"' || replace(replace(replace(trim(VALUE), ':', '":"'), ';','","'), '" ', '"') || '"}') as V
from t1, table(split_to_table(raw, '$ $'))
)
select   ID
        ,v:Description::string as "Description"
        ,v:Cost::string        as "Cost"
        ,v:Saving::string      as "Saving"
        ,v:ER::string          as "ER"
        ,v:EnR::string         as "ENR"
        ,v:Eligible::string    as "Eligible"
from X
;

However; though the title specifies "without using create function", it's going to make the situation cleaner. I wrote a Java UDF that takes an array of tokenized strings (separated by colons) and converts it into keys and values in an object.

create or replace function stk_array_to_object(A array)
returns object
language javascript
strict immutable
as
$$
    var kv;
    out = {};
    for(let i = 0; i < A.length; i  ) {
        kv = A[i].split(":");
        kv[0] = kv[0].trim();
        out[kv[0]] = kv[1].trim();
    }
    return out;
$$;

You can then get an object like this:

select ID, stk_array_to_object(split(VALUE, ';')) as V
from t1, table(split_to_table(raw, '$ $'))

From there, you could do something like a CTE to project the keys to columns and hide the object:

with X as
(
select ID, stk_array_to_object(split(VALUE, ';')) as V
from t1, table(split_to_table(raw, '$ $'))
)
select   ID
        ,v:Description::string as "Description"
        ,v:Cost::string        as "Cost"
        ,v:Saving::string      as "Saving"
        ,v:ER::string          as "ER"
        ,v:EnR::string         as "ENR"
        ,v:Eligible::string    as "Eligible"
from X;

CodePudding user response：

Another approach -

First CTE is raw data.

Second CTE (cte_1) splits each data at '$ $'. Important to note here is to capture index, which will be later used in pivot grouping. Without this index value as there is nothing unique across rows of same ID.

Third CTE (cte_2), again splits data at ';' and separates the key and value into separate columns, using split_part.

Final CTE (cte_3) does the pivot on key to get the final desired output.

with cte(id, raw) as
(select * from values
(132,'Description: mangoes; Cost: 15-30; Saving: 64; ER: E 51; EnR: D 56; Eligible: Y
$ $ Description: oranges; Cost: 1,000-1,400; Saving: 41; ER: C 75; EnR: D 58; Eligible: Y'),
(497,'Description: bananas; Cost: 25; Saving: 9; ER: E 52; EnR: D 56; Eligible: N
$ $ Description: apples; Cost: 4,000-14,000; Saving: 165; ER: E 47; EnR: E 53; Eligible: Y
$ $ Description: pears; Cost: 800-1,200; Saving: 393; ER: C 73; EnR: D 55; Eligible: Y')
), cte_1 as (
select id,c.index,c.value from cte, lateral flatten (input=>split(raw,'$ $')) c
), cte_2 as (
select c.id, c.index, trim(split_part(f.value,':',1),' ') key,
trim(split_part(f.value,':',2),' ') val
from cte_1 c,
lateral flatten (input=>split(c.value,';')) f
), cte_3 as (
select * from
(select id, index,key, val from cte_2) tab
pivot (max(val) for key in ('Description','Cost','Saving','ER','EnR','Eligible'))
)
select id,"'Description'","'Cost'","'Saving'","'ER'","'EnR'","'Eligible'" 
from cte_3;

ID	'Description'	'Cost'	'Saving'	'ER'	'EnR'	'Eligible'
132	mangoes	15-30	64	E 51	D 56	Y
132	oranges	1,000-1,400	41	C 75	D 58	Y
497	pears	800-1,200	393	C 73	D 55	Y
497	bananas	25	9	E 52	D 56	N
497	apples	4,000-14,000	165	E 47	E 53	Y

Note - in the data-section (main cte) there are carriage returns (around $ $) to avoid too much right-scrolling. Might need to adjust that if someone copy-pastes the example to test run.