I am trying to find a solution in order to improve the String searching process and I selected FULL-TEXT INDEX Strategy.
However, after implementing it, I still can see there is a performance hit when it comes to search by using multiple strings using multiple Full-Text Index tables with OR clauses.
(E.x. WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
)
As a solution, I am trying to use CONTAINSTABLE
expecting a performance improvement.
Now, I am facing an issue with CONTAINSTABLE
when it comes to joining tables with a LEFT JOIN
Please go through the example below.
Query 1
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
LEFT JOIN CONTAINSTABLE(P.Building,*,'%John%') AS FFTIndex ON F.ID = FFTIndex.[Key]
LEFT JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
LEFT JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
LEFT JOIN P.Person p ON pr2.ID = p.PID
LEFT JOIN CONTAINSTABLE(P.Person,FirstName,'%John%') AS PFTIndex ON P.ID = PFTIndex.[Key]
WHERE F.Name IS NOT NULL
This produces the below result.
Query 2
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
INNER JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
INNER JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
INNER JOIN P.Person p ON pr2.ID = p.PID
WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
AND F.Name IS NOT NULL
Result
Expectation
To use query 1 in a way that works as the behavior of an SQL SERVER OR
clause. As I can understand Query 1's CONTAINSTABLE
, joins the data with the building
table, and the rest of the results are going to ignore so that the CONTAINSTABLE
of the Person
table gets data that already contains the keyword filtered from the building
table.
If the keyword = Building
, I want to match the keyword in both the tables regardless of searching a saved record in both the tables. Having a record in each table is enough.
Summary
Query 2 performs well but is creates a slowness when the words in the indexes are growing. Query 1 seems optimized(When it comes to multiple online resources and MS Documentation), however, it does not give me the expected output.
Is there any way to solve this problem?
I am not strictly attached to CONTAINSTABLE
. Suggesting another optimization method will also be considerable.
Thank you.
CodePudding user response:
Hard to say definitively without your full data set but a couple of options to explore
Remove Invalid % Wildcards
Why are you using '%SearchTerm%'
? Does performance improve if you use the search term without the wildcards (%)? If you want a word that matches a prefix, try something like
WHERE CONTAINS (String,'"SearchTerm*"')
Try Temp Tables
My guess is CONTAINS is slightly faster than CONTAINSTABLE as it doesn't calculate a rank, but I don't know if anyone has ever attempted to benchmark it. Either way, I'd try saving off the matches to a temp table before joining up to the rest of the tables. This will allow the optimizer to create a better execution plan
SELECT ID INTO #Temp
FROM YourTable
WHERE CONTAINS (String,'"SearchTerm"')
SELECT *
FROM #Temp
INNER JOIN...
Optimize Full Text Index by Removing Noisy Words
You might find you have some noisy words aka words that reoccur many times in your data that are meaningless like "the" or perhaps some business jargon. Adding these to your stop list will mean your full text index will ignore them, making your index smaller thus faster
The query below will list indexed words with the most frequent at the top
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.YourTable') /*Replace with your table name*/)
Order By document_count Desc
This OR That Criteria
For your WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
criteria where you want this or that, is tricky. OR clauses generally perform even when using simple equality operators.
I'd try either doing two queries and union the results like:
SELECT * FROM Table1 F
/*Other joins and stuff*/
WHERE CONTAINS(F.*,'%Gayan%')
UNION
SELECT * FROM Table2 P
/*Other joins and stuff*/
WHERE CONTAINS(P.FirstName,'%John%')
OR this is much more work, but you could load all your data into giant denormalized table with all your columns. Then apply a full text index to that table and adjust your search criteria that way. It'd probably be the fastest method searching, but then you'd have to ensure the data is sync between the denormalized table and the underlying normalized tables
SELECT B.*,P.* INTO DenormalizedTable
FROM Building AS B
INNER JOIN People AS P
CREATE FULL TEXT INDEX ft ON DenormalizedTable
etc...