How to check which words of text appear in database and their frequency-CodePudding

I have a long text and a database table that acts as a dictionary, with the columns term, entity_type and main_name.

I would like to compare the words from my text to the term column and recover the terms that appear in the text, their frequency, their entity_type and their main_name.

I could do this looping through every word of the text, but it would be too long. Is there any way to optimize this process?

CodePudding user response：

First as Mike Organek suggests split your text to table with a query like this

select distinct upper(w) as word 
  from regexp_split_to_table(:your_text, '[^\w] ') w;

Then inner join it with the table and make your statistics

with words_in_text as
(
 select distinct upper(w) as word
   from regexp_split_to_table(:your_text, '[^\w] ') w
)
select tt.term, count(*) as frequency, tt.entity_type, tt.main_name
 from words_in_text wit
inner join the_table tt 
   on tt.term = wit.word
group by tt.term, tt.entity_type, tt.main_name;

Please note that if term is the primary key of the_table then you can group by term only.

db-fiddle