Home > Back-end >  How to compare a word is it found in table rows?
How to compare a word is it found in table rows?

Time:01-04

I have this PostgreSQL table for storing words:

CREATE TABLE IF NOT EXISTS words
(
    id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
    keyword character varying(300) COLLATE pg_catalog."default",
)

insert into words (words)
VALUES ('while swam is interesting');

CREATE TABLE IF NOT EXISTS trademarks
(
   id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
   trademark character varying(300) COLLATE pg_catalog."default",
)

Into table trademarks I will have thousands of registered trademarks names. I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:

I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.

I tried this:

select 
    w.id, w.keyword, t.trademark 
from
    words w
inner join 
    trademarks t 
on
  t.trademark ilike  '%'||w.keyword||'%' 
where 
   w.keyword = 'all';

But I get this result:

enter image description here

As you can see, the result is wrong. I need to match complete words, not parts of the words. How I can fix this?

CodePudding user response:

One option is to use regular expression as richyen mentions. If you need to use sql what you have got is correct, except you need to look for a pattern that uses the conditions '%SPACE'||w.keyword||'SPACE%'

select w.id, w.keyword, t.trademark 
  from words w
  join trademarks t 
    on w.keyword like '% '||t.trademark||' %' 
 where t.trademark = 'all'
union all /*condition when there are only single words without a space*/
select w.id, w.keyword, t.trademark 
  from words w
  join trademarks t 
    on w.keyword =t.trademark
 where t.trademark = 'all';

https://dbfiddle.uk/k5bYAPUh

CodePudding user response:

Split the words field on space to create an array, then use any twice: once for the where (ie "any words string containing 'all') and once for the match (ie any trademark that appears in a words list that also contains 'all'):

select 
    w.id,
    w.keyword,
    t.trademark 
from words w
join trademarks t on t.trademark =
  any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))

This matches the whole word, so no need to worry about word boundaries etc.

CodePudding user response:

You can split the keyword into a array then later compare if the trademark is inside of the splited array.

select 
    w.id, w.keyword, t.trademark 
from
    words w
inner join 
    trademarks t 
on
  t.trademark = ANY( string_to_array(REPLACE(w.keyword , ' ' , '~^~' ),'~^~' , '') )
where 
   w.keyword = 'all';

trademark and keywords overlaps each other example:

SELECT
ARRAY['swam'] && string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');

keywords is inside of trademark example

SELECT
ARRAY['swam'] @> string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');
  • Related