Performance slow in query its getting slow due to DATEDIFF function-CodePudding

I am writing a SQL query which gives me a slow performance. Because of DATEDIFF function that it gives me no any result into mails. Please help me to remake this query so that my output results faster. I will put the query below

SELECT DISTINCT isnull(hrr.SourceEmailID,'')
      ,''
      ,''
      ,hrr.RID
      ,hrr.ResID
      ,hrr.ReqID
      ,'Interview Rejected To Employee'
      ,(
       SELECT TOP 1
         RID
       FROM HCM_TEMPLATE_LIBRARY WITH (NOLOCK)
       WHERE Title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
       )
      ,GETUTCDATE()
      ,hrr.CreatedUserID
      ,0
     FROM hc_resume_bank hrb WITH (NOLOCK)
     INNER JOIN hc_req_resume hrr WITH (NOLOCK)
         ON hrr.resid = HRB.rid
     INNER JOIN HC_REQ_RESUME_STAGE_STATUS hrrss WITH (NOLOCK) ON hrrss.ReqResID = hrr.RID
     INNER JOIN HCM_RESUME_SOURCE hrs WITH (NOLOCK) ON hrs.RID = hrr.SourceID
     INNER JOIN HC_REQ_RES_INTERVIEW_STAGES hrris ON hrris.ReqResId = hrr.RID
     WHERE hrrss.stageid = 4 
      AND hrrss.statusid = 9 
      AND hrr.SourceID = 4 
      AND isnull(hrb.SourceEmailId, '') <> ''
      AND isnull(hrr.SourceEmailId, '') <> ''
      and hrr.AddedType=10 
      AND Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5

CodePudding user response：

Assuming that you have established that datediff is the root cause of poor performance, I suggest changing this:

Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5

to this:

hrrss.StatusDate >= DATEADD(MI, -5, GETDATE())

This assumes dates in StatusDate are same timezone as the server.

CodePudding user response：

DISTINCT in a query is almost always an indicator for a badly written query, where the author joins a lot of tables, builds a huge intermediate result thus that they must then boil down to its real size with DISTINCT. This is a costly operation. It seems to apply to your query. If you simply want to make sure that the hc_req_resume.resid has an entry in hc_resume_bank with a sourceemailid, then use EXISTS or IN for this lookup, not a join.

Your query with appropriate lookup clauses:

SELECT 
  ISNULL(hrr.sourceemailid,'')
  ,''
  ,''
  ,hrr.rid
  ,hrr.resid
  ,hrr.reqid
  ,'Interview Rejected To Employee'
  ,(
     SELECT TOP 1
       rid
     FROM hcm_template_library
     WHERE title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
   )
  ,GETUTCDATE()
  ,hrr.createduserid
  ,0
FROM hc_req_resume hrr
WHERE hrr.sourceid = 4 
AND hrr.addedtype = 10 
AND hrr.resid IN 
(
  SELECT hrb.rid 
  FROM hc_resume_bank hrb 
  WHERE hrb.sourceemailid <> ''
)
AND hrr.rid IN 
(
  SELECT hrrss.reqresid 
  FROM hc_req_resume_stage_status hrrss 
  WHERE hrrss.stageid = 4 
  AND hrrss.statusid = 9 
  AND hrrss.statusdate >= DATEADD(MI, -5, GETUTCDATE())
)
AND hrr.sourceid IN (SELECT hrs.rid FROM hcm_resume_source hrs)
AND hrr.rid IN (SELECT hrris.reqresid FROM hc_req_res_interview_stages);

The naming of the columns doesn't make things easier here. Why is the column sometimes called rid and sometimes reqresid? And then I see a rid combined with a resid. Is this just yet another name for the same thing? Or are there two meanings of a rid? And what is the table called the ID actually refers to? Is there a table called r or reqres or res? It doesn't seem so, but why does the ID of the table have a different name from the table, so the reader must guess what is what? We cannot even make much of a guess, if it is possible for a rid not to have a match in hc_req_res_interview_stages or for a sourceid not to have a match in hcm_resume_source. Usually you have a foreign key constraint on IDs, so either the ID is null (if this is allowed) or it does have a match in the parent table. A lookup would we pointless. Is it in your query? Or arent those tables the parent tables, but just other child tables refering to the same parent?

Remove any lookups that are not needed. The lookups in hcm_resume_source and hc_req_res_interview_stages may be such candidates, but I cannot know.

At last you want appropriate indexes. For hc_req_resume this may be something like

create index idx1 on hc_req_resume (sourceid, addedtype, rid, resid);

Then you may want:

create index idx2 on hc_resume_bank (rid) where sourceemailid <> '';
create index idx3 on hc_req_resume_stage_status (stageid, statusid, statusdate, reqresid);

The order of the columns in the indexes should be adjusted according to their selectivity.

CodePudding user response：

Salmon A has a great answer that I'd like to expand on.

Similar to why Salman A suggested you move the function to the right side of your where clause for hrrss.StatusDate, the same applies to SourceEmailId, as putting a function on the left prevents the use of an index on these columns.

However, ISNULL() is a bit more tricky to resolve, and there are several possible ways it could be addressed.

Consider if the column should really allow NULLS, and if altering the column to not allow NULLS is an option. Then your where clause would look like this.

AND hrb.SourceEmailId <> ''
AND hrr.SourceEmailId <> ''

It's also possible that SourceEmailId is always ether going to have a valid value, or be NULL. This would be preferred, as NULL should be used where a value is unknown. In which case you shouldn't be checking for <> ''. Simply check that email IS NOT NULL.

AND hrb.SourceEmailId IS NOT NULL
AND hrr.SourceEmailId IS NOT NULL

If option 1 and 2 are not an option, then consider a UNION result set. In this case, you'd write a query for hrb.SourceEmailId <> '' and UNION that to the results of a second query for hrb.SourceEmailId IS NOT NULL. Since you have checks for SourceEmailId on two different tables, it could mean as meany as four queries. However, don't get caught up on the fact it's more queries, and that that would somehow mean it'll be slower. If all 4 queries are properly tuned, and each run in 100ms, that's better than one combined query running in 5 minutes.

More details of the issues and possible work around to using ISNULL() can be found in the below links.

Once these changes have been applied, you'll have a query that can actually use indexes on these columns. At that point, I'd start reviewing your execution plans and indexes, and possibly looking at removing the DISTINCT. But, as long as you have several WHERE clauses in your query that are going to force a SCAN every time they execute, doing these things now won't yield much benefit.

CodePudding user response：

You search for a result in the future, is this correct? -Edit: i realised its just the last 5 minutes you are looking for so in this case you might just as well remove the function on the left and see if this prevents the index scan.

About the slow performance. your query (only focussing on the datediff here) is not sargable this way. SQL server will need compute the column in all the rows on the table first, always resulting in a table scan. Remove the function on the left side.

One way to get around this, is to get the results from the main table first in a sargable way, put in it a temptable and then use the temptable for the function and use its ids to get back to the maintable for the results. See below example.

IF OBJECT_ID('tempdb..#MyTableName') IS NOT NULL
    DROP TABLE #MyTableName

CREATE TABLE #MyTableName
(
    PK INT PRIMARY KEY IDENTITY (1,1) NOT NULL,
    ID INT,
    StatusDate DATETIME
)

INSERT INTO #MyTableName (ID,StatusDate ) 
SELECT
ID,StatusDate 
FROM dbo.basetable p
WHERE p.StatusDate >  GETUTCDATE() --narrow your date criteria as much as needed
GO


SELECT P.* FROM #MyTableName T 
JOIN dbo.basetable P
ON P.Id = T.ID
WHERE Datediff(MI, dateadd(mi, 330, T.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <= 5
OPTION (RECOMPILE)
;

If you can create a nonclustered index on your date column and see what it brings. In the way you wrote it, it will always scan the table but at least it has an index. In the sargable way that index will also help a bunch.