I've written two queries below with results that look similar to this:
table : messages
message_body |
---|
Are you going to get the items from Walmart? |
I am going to get the items from Target. |
I picked up the items from target. |
I am getting a computer from walmart. |
I am going grocery shopping at Whole Foods. |
table: companies
company_name |
---|
Walmart |
Sprouts |
Target |
Whole Foods |
I need to hide the names of the companies that show up in the companies table
My desired results are a query like this
message_body |
---|
I got the items from ***? |
I am going to get the items from ***. |
I picked up the items from ***. |
I am getting a computer from ***. |
I am going grocery shopping at ***. |
Because the string values in messages
can have punctuation I would need to remove all characters that are not alphanumeric to match properly with companies
.
I am working in Snowsight, but I know MS server, so I can convert functions/procedures written in MS server to Snowflake. Unfortunately, I cannot do this in Python. Any ideas to get me going? Thanks
CodePudding user response:
WITH message_body(msg) as (
select * from values
('Are you going to get the items from Walmart?'),
('I am going to get the items from Target.'),
('I picked up the items from target.'),
('I am getting a computer from walmart.'),
('I am going grocery shopping at Whole Foods.')
), company_name(name) as (
select * from values
('Walmart'),
('Sprouts'),
('Target'),
('Whole Foods')
)
select
regexp_replace(m.msg, f.name, '***', 1, 1, 'i')
from message_body as m
left join company_name as f
on m.msg ilike '%'||f.name||'%'
qualify row_number() over (partition by m.msg order by length(f.name) desc) = 1;
gives:
REGEXP_REPLACE(M.MSG, F.NAME, '***', 1, 1, 'I') |
---|
Are you going to get the items from ***? |
I am getting a computer from ***. |
I am going grocery shopping at ***. |
I am going to get the items from ***. |
I picked up the items from ***. |
CodePudding user response:
To replace all occurrences, please try below:
WITH message_body(msg) as (
select * from values
('Are you going to get the items from Walmart?'),
('I am going to get the items from Target.'),
('I picked up the items from target.'),
('I am getting a computer from walmart.'),
('I am going grocery shopping at Whole Foods.'),
('Walmart & Sprouts & Target')
),
company_name(name) as (
select * from values
('Walmart'),
('Sprouts'),
('Target'),
('Whole Foods')
),
company_name_list(name) as (
select
listagg(name, '|')
from
company_name
)
select
regexp_replace(m.msg, l.name, '***', 1, 0, 'i') as masked_name
from message_body as m,
company_name_list as l;