Home > Mobile >  Pattern match using regexp_extract_all
Pattern match using regexp_extract_all

Time:10-13

I am trying to build a array from this string and need help with pattern on regexp_extract_all.

Here is my input string contains keyword value pairs

BEGIN
 DECLARE p_JSON  STRING DEFAULT """
      {
        "instances": [{
        "LT_20MN_SalesContrctCnt": 388.0, 
        "Pyramid_Index": '', 
        "MARKET": "'Growth Markets','Europe'", 
        "SERVICE_DIM": "'S&C','F&M'",
        "SG_MD": "'All Service Group'"
      }]}
    """;

 SELECT split(x,":")[OFFSET(0)] as keyword, split(x,":")[OFFSET(1)] keyword_value
      FROM  unnest(split(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'([\'\"\[\]{}])', ''))) as x  
END;

The above SQL is failing at SPLIT due to , with in the data.

All I am trying to do here is build a two columns Keyword and value.

The idea here is if I can extract each row using REGEXP_EXTRACT_ALL with out the last "," then I should be able to split into keyword and keyword_value columns. Btw the names or number of keywords/values are not fixed.

Intended output from REGEXP_EXTRACT_ALL:

"LT_20MN_SalesContrctCnt": 388.0  
"Pyramid_Index": ''
"MARKET": "'Growth Markets','Europe'"
"SERVICE_DIM": "'S&C','F&M'"  
"SG_MD": "'All Service Group'"

Appreciate if you can suggest a better way to handle this.

Thanks in advance.

CodePudding user response:

Using your sample data, I just added an extra REGEXP_REPLACE to replace ," to @" so we can avoid splitting using ,. See approach below:

SELECT 
  SPLIT(arr,":")[OFFSET(0)] as keyword,
  SPLIT(arr,":")[OFFSET(1)] as keyword_value,
FROM sample_data,
UNNEST(SPLIT(REGEXP_REPLACE(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'[\[\]{}]',''),r',"','@"'),'@')) arr

Output:

enter image description here

  • Related