Having a table T1 contains
TITLE |
---|
age 5 alton john live |
show must go on |
Having a table T2 contains
NAME. | DESCRIPTION |
---|---|
John Bo | alton for kids |
Alton | show age kids |
I would like to split TITLE (T1) and check if at list one token exists in DESCRIPTION (T2) and return TITLE and DESCRIPTION
Expected Output:
TITLE | DESCRIPTION |
---|---|
age 5 alton john live. | alton for kids |
age 5 alton john live. | show age kids |
show must go on | show age kids |
CodePudding user response:
Main problem here is to wrangle table t1 to associate each "TITLE" value to each sequence of characters separated by space. Then you can match your wrangled table with your t2 table on a regex pattern. May require quite a bit of time as doing a join on such string condition is not very efficient, but it works.
You can generate that association with a recursive query, that exploits SUBSTRING_INDEX
to retrieve last word of "TITLE" and a numerical index that is decreased. Recursion would stop once that numerical index reaches 0.
WITH RECURSIVE cte AS (
SELECT TITLE,
LENGTH(TITLE)-LENGTH(REPLACE(TITLE,' ','')) AS num_words,
SUBSTRING_INDEX(
SUBSTRING_INDEX(TITLE, ' ',
LENGTH(TITLE)-
LENGTH(REPLACE(TITLE,' ','')) 1
), ' ', -1) AS word
FROM t1
UNION ALL
SELECT TITLE,
num_words - 1 AS num_words,
SUBSTRING_INDEX(SUBSTRING_INDEX(TITLE, ' ', num_words), ' ', -1)
FROM cte
WHERE num_words > 0
)
SELECT *
FROM cte
INNER JOIN t2
ON t2.DESCRIPTION REGEXP CONCAT('( |^)', cte.word, '( |$)')
Check the demo here.
Assumption: your sentences have no punctuation, and every word is separated by a space character.
CodePudding user response:
The following SQL query can be used to split the titles in T1
and check if at least one token exists in the descriptions in T2
:
SELECT T1.TITLE, T2.DESCRIPTION
FROM T1
JOIN T2
ON T1.TITLE LIKE '%' || T2.DESCRIPTION || '%'
The query uses a combination of the JOIN
and LIKE
operators to find the matching records between the tables T1
and T2
.