I want to convert my code that I was running in Netezza (SQL) to Azure Synapse (T-SQL). I was using the built-in Netezza SQL function REGEXP_EXTRACT but this function is not built-in Azure Synapse.
Here's the code I'm trying to convert
-- Assume that "column_v1" has datatype Character Varying(3) and can take value between 0 to 999 or NULL
SELECT
column_v1
, REGEXP_EXTRACT(column_v1, '[0-9] ') as column_v2
FROM INPUT_TABLE
;
Thanks, John
CodePudding user response:
regexExtract()
function is supported in Synapse.
In order to implement it, you need to use couple of things, here is a demo that i built, here im using the SalesLT.Customer data that is supported as a sample data in microsoft:
In Synapse -> Integrate tab:
- Create new pipeline
- Add dataflow activity to your pipline
- In dataflow activity: under settings tab -> create new data flow
- double click on the dataflow (it should open it) Add source (it can be blob storage / files on prem etc.)
- add a derived column transformation
- in derived column add new column (or override an existing column) in Expression: add this command
regexExtract(Phone,'(\\d{3})')
it will select the 3 first digits, since my data has dashes in it, its makes more sense to replace all characters that are not digits using regexReplace method:regexReplace(Phone,'[^0-9]', '')
- add sink
DataFlow activities:
derived column transformation:
Output:
please check MS docs:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions
CodePudding user response:
Regex_extract is not available in T-SQL. Thus, we try to do similar functionalities using Substring/left/right functions along with Patindex function
SELECT input='789A',
extract= SUBSTRING('789A', PATINDEX('[0-9][0-9][0-9]', '789A'),4);
Result
Refer Microsoft documents patindex (T-SQL), substring (T-SQL) for additional information.