Home > Software engineering >  How to check which words of text appear in database and their frequency
How to check which words of text appear in database and their frequency

Time:10-21

I have a long text and a database table that acts as a dictionary, with the columns term, entity_type and main_name.

I would like to compare the words from my text to the term column and recover the terms that appear in the text, their frequency, their entity_type and their main_name.

I could do this looping through every word of the text, but it would be too long. Is there any way to optimize this process?

CodePudding user response:

First as Mike Organek suggests split your text to table with a query like this

select distinct upper(w) as word 
  from regexp_split_to_table(:your_text, '[^\w] ') w;

Then inner join it with the table and make your statistics

with words_in_text as
(
 select distinct upper(w) as word
   from regexp_split_to_table(:your_text, '[^\w] ') w
)
select tt.term, count(*) as frequency, tt.entity_type, tt.main_name
 from words_in_text wit
inner join the_table tt 
   on tt.term = wit.word
group by tt.term, tt.entity_type, tt.main_name;

Please note that if term is the primary key of the_table then you can group by term only.

db-fiddle

  • Related