SQL split string and find if token exists in other table-CodePudding

Having a table T1 contains

TITLE
age 5 alton john live
show must go on

Having a table T2 contains

NAME.	DESCRIPTION
John Bo	alton for kids
Alton	show age kids

I would like to split TITLE (T1) and check if at list one token exists in DESCRIPTION (T2) and return TITLE and DESCRIPTION

Expected Output:

TITLE	DESCRIPTION
age 5 alton john live.	alton for kids
age 5 alton john live.	show age kids
show must go on	show age kids

CodePudding user response：

Main problem here is to wrangle table t1 to associate each "TITLE" value to each sequence of characters separated by space. Then you can match your wrangled table with your t2 table on a regex pattern. May require quite a bit of time as doing a join on such string condition is not very efficient, but it works.

You can generate that association with a recursive query, that exploits SUBSTRING_INDEX to retrieve last word of "TITLE" and a numerical index that is decreased. Recursion would stop once that numerical index reaches 0.

WITH RECURSIVE cte AS (
    SELECT TITLE, 
           LENGTH(TITLE)-LENGTH(REPLACE(TITLE,' ','')) AS num_words,
           SUBSTRING_INDEX(
               SUBSTRING_INDEX(TITLE, ' ', 
                               LENGTH(TITLE)-
                               LENGTH(REPLACE(TITLE,' ','')) 1
               ), ' ', -1) AS word
    FROM t1
    UNION ALL
    SELECT TITLE,
           num_words - 1 AS num_words,
           SUBSTRING_INDEX(SUBSTRING_INDEX(TITLE, ' ', num_words), ' ', -1)
    FROM cte
    WHERE num_words > 0
)
SELECT *
FROM       cte
INNER JOIN t2
        ON t2.DESCRIPTION REGEXP CONCAT('( |^)', cte.word, '( |$)')

Check the demo here.

Assumption: your sentences have no punctuation, and every word is separated by a space character.

CodePudding user response：

The following SQL query can be used to split the titles in T1 and check if at least one token exists in the descriptions in T2:

SELECT T1.TITLE, T2.DESCRIPTION
FROM T1
JOIN T2
ON T1.TITLE LIKE '%' || T2.DESCRIPTION || '%'

The query uses a combination of the JOIN and LIKE operators to find the matching records between the tables T1 and T2.