I am trying to use regexp_substr to break up table data held in one cell to the individual fields.
the data is , delimited. individual cells can also contain , within quotes and finally some cells can be unpopulated
My sample logic is working for the first 2 requirements but i can't sort the third, please help!
the issue is b4 should be null but it is being returned as F.
SEL
'a, b, c,, F,"d, e, f", g, h' AS f1,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,1) AS b1,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,2) AS b2,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,3) AS b3,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,4) AS b4,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,5) AS b5,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,6) AS b6,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,7) AS b7,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,8) AS b8,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,9) AS b9,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,10) AS b10
;
Thanks
JF
CodePudding user response:
Your input looks like CSV-data. If the number of columns is constant you can utilize the CSVLD table function:
WITH cte AS
( -- base select here
SELECT 'a, b, c,, F,"d, e, f", g, h' as f1
--FROM mytable
)
SELECT *
FROM TABLE
(
CSVLD
(cte.f1 -- input column
,',' -- delimiter character
,'"' -- quote character
)
RETURNS
(
b1 VarChar(11) CHARACTER SET Unicode
,b2 VarChar(11) CHARACTER SET Unicode
,b3 VarChar(11) CHARACTER SET Unicode
,b4 VarChar(11) CHARACTER SET Unicode
,b5 VarChar(11) CHARACTER SET Unicode
,b6 VarChar(11) CHARACTER SET Unicode
,b7 VarChar(11) CHARACTER SET Unicode
,b8 VarChar(11) CHARACTER SET Unicode
)
) AS t
;
If your input column is LATIN remove the CHARACTER SET Unicode
frm the output columns.
CodePudding user response:
This regex works for your sample case:
(?:,|^) ?(".*?"|[^,]*)
You'll need to use the second group of the match instead of the first.