Home > Blockchain >  Is there an equivalent to OR clause in CONTAINSTABLE - FULL TEXT INDEX
Is there an equivalent to OR clause in CONTAINSTABLE - FULL TEXT INDEX

Time:03-01

I am trying to find a solution in order to improve the String searching process and I selected FULL-TEXT INDEX Strategy.

However, after implementing it, I still can see there is a performance hit when it comes to search by using multiple strings using multiple Full-Text Index tables with OR clauses.

(E.x. WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%'))

As a solution, I am trying to use CONTAINSTABLE expecting a performance improvement.

Now, I am facing an issue with CONTAINSTABLE when it comes to joining tables with a LEFT JOIN

Please go through the example below.

Query 1

SELECT F.Name,p.*
        FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
LEFT JOIN CONTAINSTABLE(P.Building,*,'%John%') AS FFTIndex ON F.ID = FFTIndex.[Key]
LEFT JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
LEFT JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
LEFT JOIN P.Person p ON pr2.ID = p.PID
LEFT JOIN CONTAINSTABLE(P.Person,FirstName,'%John%') AS PFTIndex ON P.ID = PFTIndex.[Key]
WHERE F.Name IS NOT NULL

This produces the below result.

Query 1 Output

Query 2

SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
INNER JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
INNER JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
INNER JOIN P.Person p ON pr2.ID = p.PID
WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%') 
AND F.Name IS NOT NULL

Result

Query 2 Output

Expectation To use query 1 in a way that works as the behavior of an SQL SERVER OR clause. As I can understand Query 1's CONTAINSTABLE, joins the data with the building table, and the rest of the results are going to ignore so that the CONTAINSTABLE of the Person table gets data that already contains the keyword filtered from the building table.

If the keyword = Building, I want to match the keyword in both the tables regardless of searching a saved record in both the tables. Having a record in each table is enough.

Summary

Query 2 performs well but is creates a slowness when the words in the indexes are growing. Query 1 seems optimized(When it comes to multiple online resources and MS Documentation), however, it does not give me the expected output.

Is there any way to solve this problem?

I am not strictly attached to CONTAINSTABLE. Suggesting another optimization method will also be considerable. Thank you.

CodePudding user response:

Hard to say definitively without your full data set but a couple of options to explore

Remove Invalid % Wildcards

Why are you using '%SearchTerm%'? Does performance improve if you use the search term without the wildcards (%)? If you want a word that matches a prefix, try something like WHERE CONTAINS (String,'"SearchTerm*"')

Try Temp Tables

My guess is CONTAINS is slightly faster than CONTAINSTABLE as it doesn't calculate a rank, but I don't know if anyone has ever attempted to benchmark it. Either way, I'd try saving off the matches to a temp table before joining up to the rest of the tables. This will allow the optimizer to create a better execution plan

SELECT ID INTO #Temp 
FROM YourTable 
WHERE CONTAINS (String,'"SearchTerm"')

SELECT *
FROM #Temp
INNER JOIN...

Optimize Full Text Index by Removing Noisy Words

You might find you have some noisy words aka words that reoccur many times in your data that are meaningless like "the" or perhaps some business jargon. Adding these to your stop list will mean your full text index will ignore them, making your index smaller thus faster

The query below will list indexed words with the most frequent at the top

Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.YourTable') /*Replace with your table name*/)
Order By document_count Desc

This OR That Criteria

For your WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%') criteria where you want this or that, is tricky. OR clauses generally perform even when using simple equality operators. I'd try either doing two queries and union the results like:

SELECT * FROM Table1 F
/*Other joins and stuff*/
WHERE CONTAINS(F.*,'%Gayan%')
UNION
SELECT * FROM Table2 P 
/*Other joins and stuff*/
WHERE CONTAINS(P.FirstName,'%John%') 

OR this is much more work, but you could load all your data into giant denormalized table with all your columns. Then apply a full text index to that table and adjust your search criteria that way. It'd probably be the fastest method searching, but then you'd have to ensure the data is sync between the denormalized table and the underlying normalized tables

SELECT B.*,P.* INTO DenormalizedTable
FROM Building AS B
INNER JOIN People AS P

CREATE FULL TEXT INDEX ft ON DenormalizedTable
etc...
  • Related