Home > Software design >  What is the equivalent of the SQL function REGEXP_EXTRACT in Azure Synapse?
What is the equivalent of the SQL function REGEXP_EXTRACT in Azure Synapse?

Time:10-12

I want to convert my code that I was running in Netezza (SQL) to Azure Synapse (T-SQL). I was using the built-in Netezza SQL function REGEXP_EXTRACT but this function is not built-in Azure Synapse.

Here's the code I'm trying to convert

-- Assume that "column_v1" has datatype Character Varying(3) and can take value between 0 to 999 or NULL
SELECT
    column_v1
  , REGEXP_EXTRACT(column_v1, '[0-9] ') as column_v2
FROM INPUT_TABLE
;

Thanks, John

CodePudding user response:

regexExtract() function is supported in Synapse.

In order to implement it, you need to use couple of things, here is a demo that i built, here im using the SalesLT.Customer data that is supported as a sample data in microsoft:

In Synapse -> Integrate tab:

  1. Create new pipeline
  2. Add dataflow activity to your pipline
  3. In dataflow activity: under settings tab -> create new data flow
  4. double click on the dataflow (it should open it) Add source (it can be blob storage / files on prem etc.)
  5. add a derived column transformation
  6. in derived column add new column (or override an existing column) in Expression: add this command regexExtract(Phone,'(\\d{3})') it will select the 3 first digits, since my data has dashes in it, its makes more sense to replace all characters that are not digits using regexReplace method: regexReplace(Phone,'[^0-9]', '')
  7. add sink

DataFlow activities:

enter image description here

derived column transformation:

enter image description here

Output:

enter image description here

please check MS docs:

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-derived-column

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions

CodePudding user response:

Regex_extract is not available in T-SQL. Thus, we try to do similar functionalities using Substring/left/right functions along with Patindex function

SELECT input='789A', 
extract= SUBSTRING('789A', PATINDEX('[0-9][0-9][0-9]', '789A'),4);

Result

enter image description here

Refer Microsoft documents patindex (T-SQL), substring (T-SQL) for additional information.

  • Related