I have a long text and a database table that acts as a dictionary, with the columns term, entity_type and main_name.
I would like to compare the words from my text to the term column and recover the terms that appear in the text, their frequency, their entity_type and their main_name.
I could do this looping through every word of the text, but it would be too long. Is there any way to optimize this process?
CodePudding user response:
First as Mike Organek suggests split your text to table with a query like this
select distinct upper(w) as word
from regexp_split_to_table(:your_text, '[^\w] ') w;
Then inner join it with the table and make your statistics
with words_in_text as
(
select distinct upper(w) as word
from regexp_split_to_table(:your_text, '[^\w] ') w
)
select tt.term, count(*) as frequency, tt.entity_type, tt.main_name
from words_in_text wit
inner join the_table tt
on tt.term = wit.word
group by tt.term, tt.entity_type, tt.main_name;
Please note that if term
is the primary key of the_table
then you can group by term
only.