Home > Software engineering >  Teradata SQL Regular expression - dealing with consecutive delimiter
Teradata SQL Regular expression - dealing with consecutive delimiter

Time:12-19

I am trying to use regexp_substr to break up table data held in one cell to the individual fields.

the data is , delimited. individual cells can also contain , within quotes and finally some cells can be unpopulated

My sample logic is working for the first 2 requirements but i can't sort the third, please help!

the issue is b4 should be null but it is being returned as F.

SEL
'a, b, c,, F,"d, e, f", g, h' AS f1,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,1) AS b1,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,2) AS b2,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,3) AS b3,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,4) AS b4,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,5) AS b5,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,6) AS b6,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,7) AS b7,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,8) AS b8,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,9) AS b9,
RegExp_Substr(f1,'(".*?"|[^",\s] )(?=,|$)',1,10) AS b10
;

Thanks

JF

CodePudding user response:

Your input looks like CSV-data. If the number of columns is constant you can utilize the CSVLD table function:

WITH cte AS 
 (  -- base select here
   SELECT 'a, b, c,, F,"d, e, f", g, h'  as f1
   --FROM mytable
 )
SELECT *
FROM TABLE
 (
   CSVLD
    (cte.f1  -- input column
    ,','     -- delimiter character
    ,'"'     -- quote character
    )
   RETURNS
    (
      b1  VarChar(11) CHARACTER SET Unicode
     ,b2  VarChar(11) CHARACTER SET Unicode
     ,b3  VarChar(11) CHARACTER SET Unicode
     ,b4  VarChar(11) CHARACTER SET Unicode
     ,b5  VarChar(11) CHARACTER SET Unicode
     ,b6  VarChar(11) CHARACTER SET Unicode
     ,b7  VarChar(11) CHARACTER SET Unicode
     ,b8  VarChar(11) CHARACTER SET Unicode
    )
 ) AS t
;

If your input column is LATIN remove the CHARACTER SET Unicode frm the output columns.

CodePudding user response:

This regex works for your sample case:

(?:,|^) ?(".*?"|[^,]*)

You'll need to use the second group of the match instead of the first.

  • Related