Say I have the following query:
Select
COLUMN as RAW_COLUMN
,REGEX_REPLACE(COLUMN,'REGEX_PATTERN_HERE','REPLACEMENT_STRING_HERE') as CLEAN_COLUMN
FROM TABLE
As you can see above I created a Javascript function that allows me to replace a column value using regex patterns. Below is the function:
CREATE OR REPLACE FUNCTION "REGEX_REPLACE"("SUBJECT" VARCHAR(16777216), "PATTERN" VARCHAR(16777216), "REPLACEMENT" VARCHAR(16777216))
RETURNS VARCHAR(16777216)
LANGUAGE JAVASCRIPT
AS '
const p = SUBJECT;
let regex = new RegExp(PATTERN, ''i'')
return p.replace(regex, REPLACEMENT);
';
I need to clean the data with several regex replacements. What I'm trying to do is create a dictionary that will hold the regex pattern and replacement string that will be used to loop through and execute the regex_replace()
function. In other words, run REGEX_REPLACE()
function inside the SQL query itself over and over again until it goes through the entire dictionary.
//Key:value rule for regex_pattern_dict={'THE PATTERN HERE':'WHAT YOU WANT TO REPLACE IT WITH'}
regex_pattern_dict ={
'[0-9]{2}[\/\.][0-9]{2}':'' -- remove dates i.e. "08/02"
,'[0-9]{3}[-][0-9]{3}[-][0-9]{4}':'' -- remove phone numbers i.e. "888-957-4675"
,'[ ][0-9]{11}': '' -- remove phone numbers i.e. " 18882467822"
};
In other words, in the SQL query itself I'm trying to create a loop with the Javascript REGEX_REPLACE
function and dictionary (dictionary is also in javascript).
I guess what I'm trying to do is something like this:
Select
COLUMN as RAW_COLUMN
--,REGEX_REPLACE(COLUMN,'REGEX_PATTERN_HERE','REPLACEMENT_STRING_HERE') as CLEAN_COLUMN
,for (const [PATTERN, REPLACEMENT] of Object.entries(regex_pattern_dict)) {
REGEX_REPLACE(COLUMN,PATTERN,REPLACEMENT);
} AS CLEAN_COLUMN
FROM TABLE
Any ideas how to approach this? Not sure how to put a javascript loop in SQL.
With the help of @David Garrison the below code was the solution:
CREATE or replace PROCEDURE TABLE_CLEAN()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$
//Regex Cleaning Dictionary
const regex_pattern_dict ={
"[0-9]{2}[\/\.][0-9]{2}":"" // remove dates i.e. "08/02"
,"[0-9]{3}[-][0-9]{3}[-][0-9]{4}":"" // remove phone numbers i.e. "888-957-4675"
,"[ ][0-9]{11}": "" // remove phone numbers i.e. " 18882467822"
};
//Column to Clean
var RegexStr = "COLUMN";
//Key = Regex Pattern, Value = Replacement
for (const [key, value] of Object.entries(regex_pattern_dict)) {
RegexStr = "REXP_REPLACE_ME(" RegexStr ",'" key "','" value "')"
}
var rs = snowflake.execute( { sqlText:
`
CREATE OR REPLACE VIEW DATABASE.SCHEMA.TABLE AS
SELECT
COLUMN
,` RegexStr ` as clean_column
FROM DATABASE.SCHEMA.TABLE_ORIGINAL
`
} );
$$;
call TABLE_CLEAN();
select * from DATABASE.SCHEMA.TABLE
CodePudding user response:
updated with more correct syntax. Full solution is above in the edited question
var RegexStr = "COLUMN";
for (const [key, value] of Object.entries(regex_pattern_dict)) {
RegexStr = "REXP_REPLACE_ME(" RegexStr ",'" key "','" value "')"
}
sql = `select
raw_column,
` RegexStr `as clean_column
from table`
sql_statement = snowflake.createStatement({sqlText: sql });
result_set = sql_statement.execute();
Note: This may require using a procedure instead of a function.