Home > Mobile >  TSQL Improving performance of Update cross apply like statement
TSQL Improving performance of Update cross apply like statement

Time:04-08

I have a client with a stored procedure that currently take 25 minutes to run. I have narrowed the cause of this to the following statement (changed column and table names)

UPDATE #customer_emails_tmp 
    SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no, 
        #customer_emails_tmp.Possible_Project_id = cp.order_uid
    FROM #customer_emails_tmp e
    CROSS APPLY (
        SELECT TOP 1 p.order_project_no, p.order_uid
        FROM [order] p 
        WHERE e.Subject LIKE '%'   p.order_title   '%'
        AND p.order_date < e.timestamp
        ORDER BY p.order_date DESC
    ) as cp
    WHERE e.Possible_Project_Ref IS NULL;

There are 3 slightly different version of the above, joining to 1 of three tables. The issue is the CROSS APPLY LIKE '%' p.title '%'. I have tried looking into CONTAINS() and FREETEXT() but as far as my testing and investigations go, you cannot do CONTAINS(e.title, p.title) or FREETEXT(e.title,p.title).

Have I miss read something or is there a better way to write the above query?

Any help on this is much appreciated.

EDIT

Updated query to actual query used. Execution plan: https://www.brentozar.com/pastetheplan/?id=B1YPbJiX5

Tmp table has the following indexes:

CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient ON #customer_emails_tmp (First_Recipient);
    CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient_domain_name ON #customer_emails_tmp (First_Recipient_Domain_Name);
    CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_client_id ON #customer_emails_tmp (customer_emails_client_id);
    CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_subject ON #customer_emails_tmp ([subject]);

There is no index on the [order] table for column order_title

Edit 2

The purpose of this SP is to link orders (amongst others) to sent emails. This is done via multiple UPDATE statements; all other update statements are less than a second in length; however, this one ( and 2 others exactly the same but looking at 2 other tables) take an extraordinary amount of time.

I cannot remove the filter on Possible_Project_Ref IS NULL as we only want to update the ones that are null.

Also, I cannot change WHERE e.Subject LIKE '%' p.order_title '%' to WHERE e.Subject LIKE p.order_title '%' because the subject line may not start with the p.order_title, for example it could start with FW: or RE:

CodePudding user response:

Reviewing your execution plan, I think the main issue is you're reading a lot of data from the order table. You are reading 27,447,044 rows just to match up to find 783 rows. Your 20k row temp table is probably nothing by comparison.

Without knowing your data or desired business logic, here's a couple things I'd consider:

Updating First Round of Exact Matches

I know you need to keep your %SearchTerm% parameters, but some data might have exact matches. So if you run an initial update for exact matches, it will reduce the ones you have to search with %SearchTerm%

Run something like this before your current update

/*Recommended index for this update*/
CREATE INDEX ix_test ON [order](order_title,order_date) INCLUDE (order_project_no, order_uid)

UPDATE #customer_emails_tmp 
SET Possible_Project_Ref = cp.order_project_no
    ,Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
CROSS APPLY (
    SELECT TOP 1 p.order_project_no, p.order_uid
    FROM [order] p 
    WHERE e.Subject = p.order_title
    AND p.order_date < e.timestamp
    ORDER BY p.order_date DESC
) as cp
WHERE e.Possible_Project_Ref IS NULL;

Narrowing Search Range

This will technically change your matching criteria, but there are probably certain logical assumptions you can make that won't impact the final results. Here are a couple of ideas for you to consider, to get you thinking this way, but only you know your business. The end goal should be to narrow the data read from the order table

  • Is there a customer id you can match on? Something like e.customerID = p.customerID? Do you really match any email to any order?
  • Can you narrow your search date range to something like x days before timestamp? Do you really need to search all historical orders for all of time? Would you even want a match if an email matches to an order from 5 years ago? For this, try updating your APPLY date filter to something like p.order_date BETWEEN DATEADD(dd,-30,e.[timestamp]) AND e.[timestamp]

Other Miscellaneous Notes

  • If I'm understanding this correctly, you are trying to link email to some sort of project #. Ideally, when the email are generated, they would be linked to a project immediately. I know this is not always possible resource/time wise, but the clean solution is to calculate this at the beginning of the process, not afterwards. Generally anytime you have to use fuzzy string matching, you will have data issues. I know business always wants results "yesterday" and always pushes for the shortcut, and nobody ever wants to update legacy processes, but sometimes you need to if you want clean data
  • I'd review your indexes on the temp table. Generally I find the cost to create the indexes and for SQL Server to maintain them as I update the temp table is not worth it. So 9 times out of 10, I leave the temp table as a plain heap with 0 indexes

CodePudding user response:

First, filter the NULLs when you create #customer_emails_tmp, not after. Then you can lose: WHERE e.Possible_Project_Ref IS NULL. This way you are only bringing in rows you need instead of retrieving rows you don't need, then filtering them.

Next, us this for your WHERE clause:

WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp)

If an order date doesn't have any later timestamps in e, none of the rows in e will be considered.

Next remove the timestamp filter from your APPLY subquery. Now your subquery looks like this:

SELECT TOP 1 p.order_project_no, p.order_uid
FROM     [order] AS p 
WHERE    e.Subject LIKE '%'   p.order_title   '%'
ORDER BY p.order_date DESC

This way you are applying your "Subject Like" filter to a much smaller set of rows. The final query would look like this:

UPDATE #customer_emails_tmp 
  SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no, 
      #customer_emails_tmp.Possible_Project_id = cp.order_uid
  FROM #customer_emails_tmp e
  CROSS APPLY (
    SELECT TOP 1 p.order_project_no, p.order_uid
    FROM [order] p 
    WHERE e.Subject LIKE '%'   p.order_title   '%'
    ORDER BY p.order_date DESC
  ) as cp
  WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp);
  • Related