Home > database >  Count the occurrences of a group of words in a text column in SQL
Count the occurrences of a group of words in a text column in SQL

Time:02-26

I have two tables as follows :

CREATE TABLE keyword_tbl
(
    WORDS VARCHAR(100), 
    TOPIC VARCHAR(100)
); 

INSERT INTO keyword_tbl 
VALUES ('leaf', 'nature'), ('leaves', 'nature'),
       ('wind', 'nature'), ('knife', 'utensils'),
       ('knives', 'utensils'), ('calf', 'animal'),
       ('calves', 'animal')

CREATE TABLE content
(
    CONTENT_ID VARCHAR(100), 
    DESCRIPTION VARCHAR(100)
); 

INSERT INTO content 
VALUES ('uuid1', 'leaves fall in autumn like leafs'),
       ('uuid2', 'the calf is playing in the leaf, the knife' ),
       ('uuid3', 'knives cutting the wind'),
       ('uuid4', 'he says hi'),  
       ('uuid5', 'the calves running through the wind')

I want to be able to count the occurrences of each word per topic. My ideal output would look as follows.

content_id description nature utensils animal
uuid1 leaves fall in autumn like leafs 2 0 0
uuid2 the calf is playing in the leaf, the knife 1 1 1
uuid3 knives cutting the wind 1 1 0
uuid4 he says hi 0 0 0
uuid5 the calves running through the wind 1 0 1
------------ --------------------------------------------- -------- ---------- --------

Explanation :

  • For uuid1, we count leaves and leaf hence nature has a value of 2,
  • For uuid2, we count calf, leaf, knife hence nature, utensils and animal have a count of 1, etc...

Is there a way for this to be done autonomously?

CodePudding user response:

Create Split Function like this

CREATE FUNCTION [dbo].[Split]
(
    @String varchar(8000), @Delimiter char(1)
)       
returns @temptable TABLE (items varchar(8000))       
as       
begin       
    declare @idx int       
    declare @slice varchar(8000)       
      
    select @idx = 1       
        if len(@String)<1 or @String is null  return       
      
    while @idx!= 0       
    begin       
        set @idx = charindex(@Delimiter,@String)       
        if @idx!=0       
            set @slice = left(@String,@idx - 1)       
        else       
            set @slice = @String       
          
        if(len(@slice)>0)  
            insert into @temptable(Items) values(@slice)       
  
        set @String = right(@String,len(@String) - @idx)       
        if len(@String) = 0 break       
    end   
return       
end  

Use To get string count from your table

select CONTENT_ID,DESCRIPTION,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'nature') as animal,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'utensils') as nature,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'animal') as utensils from content

here is string split from ' ' your string in leafs and leaf are different that is not count.

CodePudding user response:

Using enter image description here

To handle "leaf", "leafs" the join condition needs to be altered:

 -- substring
 ON  TRIM(s.value) ILIKE  k.words|| '%'

 -- only 's'
 ON  TRIM(s.value) ILIKE ANY (k.words, k.words|| 's')

Output:

enter image description here

  • Related