Home > database >  Hide String characters based on column in different table
Hide String characters based on column in different table

Time:05-20

I've written two queries below with results that look similar to this:

table : messages

message_body
Are you going to get the items from Walmart?
I am going to get the items from Target.
I picked up the items from target.
I am getting a computer from walmart.
I am going grocery shopping at Whole Foods.

table: companies

company_name
Walmart
Sprouts
Target
Whole Foods

I need to hide the names of the companies that show up in the companies table

My desired results are a query like this

message_body
I got the items from ***?
I am going to get the items from ***.
I picked up the items from ***.
I am getting a computer from ***.
I am going grocery shopping at ***.

Because the string values in messages can have punctuation I would need to remove all characters that are not alphanumeric to match properly with companies.

I am working in Snowsight, but I know MS server, so I can convert functions/procedures written in MS server to Snowflake. Unfortunately, I cannot do this in Python. Any ideas to get me going? Thanks

CodePudding user response:

WITH message_body(msg) as (
    select * from values
        ('Are you going to get the items from Walmart?'),
        ('I am going to get the items from Target.'),
        ('I picked up the items from target.'),
        ('I am getting a computer from walmart.'),
        ('I am going grocery shopping at Whole Foods.')
), company_name(name) as (
    select * from values
        ('Walmart'),
        ('Sprouts'),
        ('Target'),
        ('Whole Foods')
)
select 
    regexp_replace(m.msg, f.name, '***', 1, 1, 'i')
from message_body as m
left join company_name as f
    on m.msg ilike '%'||f.name||'%'
qualify row_number() over (partition by m.msg order by length(f.name) desc) = 1;

gives:

REGEXP_REPLACE(M.MSG, F.NAME, '***', 1, 1, 'I')
Are you going to get the items from ***?
I am getting a computer from ***.
I am going grocery shopping at ***.
I am going to get the items from ***.
I picked up the items from ***.

CodePudding user response:

To replace all occurrences, please try below:

WITH message_body(msg) as (
    select * from values
        ('Are you going to get the items from Walmart?'),
        ('I am going to get the items from Target.'),
        ('I picked up the items from target.'),
        ('I am getting a computer from walmart.'),
        ('I am going grocery shopping at Whole Foods.'),
        ('Walmart & Sprouts & Target')
), 
company_name(name) as (
    select * from values
        ('Walmart'),
        ('Sprouts'),
        ('Target'),
        ('Whole Foods')
),
company_name_list(name) as (
    select 
        listagg(name, '|')
    from 
         company_name
)
select 
    regexp_replace(m.msg, l.name, '***', 1, 0, 'i') as masked_name
from message_body as m,
company_name_list as l;
  • Related