Home > other >  SQL(Snowflake) - Removing duplicates from number extraction
SQL(Snowflake) - Removing duplicates from number extraction

Time:09-29

I am looking to extract these numbers from a string but only keep unique numbers. There are strings that contain more than 3 unique numbers in them as well.

How do I extract all numbers from a string while removing duplicates? (Most digits will be 7-8 characters in length)

     SELECT REGEXP_SUBSTR(REPLACE(string,' '),'[[:digit:]]{8}',1,1)||' '||REGEXP_SUBSTR(REPLACE(string,' '),'[[:digit:]]{8}',1,1)||' '||REGEXP_SUBSTR(REPLACE(string,' '),'[[:digit:]]{8}',1,1) as Num_Value 
FROM ( *select DESCRIPTION as string
      from...*)
    ) 
WHERE Num_Value IS NOT NULL

Strings

  1. "Contract No(s). 02241899, 02749981, (as..."
  2. "Contract No(s). 02515351,02747764,02707694 (as..."
  3. "Contract No(s). 02667112, (as..."

My Results

  1. 02241899 02241899 02241899
  2. 02515351 02515351 02515351
  3. 02667112 02667112 02667112

What I'm Looking For

  1. 02241899 02749981
  2. 02515351 02747764 02707694
  3. 02667112

CodePudding user response:

One way is to just not use the regex and instead split to table based on a reliable delimiter

with t1 (id, str) as
(select 1, 'Contract No(s). 02241899, 02749981, (as...') 
          
select distinct 
       t1.id, 
       t1.str,
       t2.value as contact
from t1,lateral split_to_table(replace(t1.str,' ',','), ',') t2
where try_cast(t2.value as integer) is not null and
      len(t2.value) in (7,8);
  • Related