I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
)
insert into words (words)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
)
Into table trademarks
I will have thousands of registered trademarks names.
I want to compare words stored into words
table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting
stored into words.keyword
. I also have a trademark swam
located in trademarks.trademark
I have a word match, so I want to detect this using SQL.
I tried this:
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark ilike '%'||w.keyword||'%'
where
w.keyword = 'all';
But I get this result:
As you can see, the result is wrong. I need to match complete words, not parts of the words. How I can fix this?
CodePudding user response:
One option is to use regular expression as richyen mentions. If you need to use sql what you have got is correct, except you need to look for a pattern that uses the conditions '%SPACE'||w.keyword||'SPACE%'
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword like '% '||t.trademark||' %'
where t.trademark = 'all'
union all /*condition when there are only single words without a space*/
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword =t.trademark
where t.trademark = 'all';
CodePudding user response:
Split the words field on space to create an array, then use any
twice: once for the where (ie "any words string containing 'all') and once for the match (ie any trademark that appears in a words list that also contains 'all'):
select
w.id,
w.keyword,
t.trademark
from words w
join trademarks t on t.trademark =
any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))
This matches the whole word, so no need to worry about word boundaries etc.
CodePudding user response:
You can split the keyword into a array then later compare if the trademark is inside of the splited array.
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark = ANY( string_to_array(REPLACE(w.keyword , ' ' , '~^~' ),'~^~' , '') )
where
w.keyword = 'all';
trademark and keywords overlaps each other example:
SELECT
ARRAY['swam'] && string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');
keywords is inside of trademark example
SELECT
ARRAY['swam'] @> string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');