I am writing a SQL query which gives me a slow performance. Because of DATEDIFF function that it gives me no any result into mails. Please help me to remake this query so that my output results faster. I will put the query below
SELECT DISTINCT isnull(hrr.SourceEmailID,'')
,''
,''
,hrr.RID
,hrr.ResID
,hrr.ReqID
,'Interview Rejected To Employee'
,(
SELECT TOP 1
RID
FROM HCM_TEMPLATE_LIBRARY WITH (NOLOCK)
WHERE Title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
)
,GETUTCDATE()
,hrr.CreatedUserID
,0
FROM hc_resume_bank hrb WITH (NOLOCK)
INNER JOIN hc_req_resume hrr WITH (NOLOCK)
ON hrr.resid = HRB.rid
INNER JOIN HC_REQ_RESUME_STAGE_STATUS hrrss WITH (NOLOCK) ON hrrss.ReqResID = hrr.RID
INNER JOIN HCM_RESUME_SOURCE hrs WITH (NOLOCK) ON hrs.RID = hrr.SourceID
INNER JOIN HC_REQ_RES_INTERVIEW_STAGES hrris ON hrris.ReqResId = hrr.RID
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrr.SourceID = 4
AND isnull(hrb.SourceEmailId, '') <> ''
AND isnull(hrr.SourceEmailId, '') <> ''
and hrr.AddedType=10
AND Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5
CodePudding user response:
Assuming that you have established that datediff is the root cause of poor performance, I suggest changing this:
Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5
to this:
hrrss.StatusDate >= DATEADD(MI, -5, GETDATE())
This assumes dates in StatusDate are same timezone as the server.
CodePudding user response:
DISTINCT
in a query is almost always an indicator for a badly written query, where the author joins a lot of tables, builds a huge intermediate result thus that they must then boil down to its real size with DISTINCT
. This is a costly operation. It seems to apply to your query. If you simply want to make sure that the hc_req_resume.resid has an entry in hc_resume_bank with a sourceemailid, then use EXISTS
or IN
for this lookup, not a join.
Your query with appropriate lookup clauses:
SELECT
ISNULL(hrr.sourceemailid,'')
,''
,''
,hrr.rid
,hrr.resid
,hrr.reqid
,'Interview Rejected To Employee'
,(
SELECT TOP 1
rid
FROM hcm_template_library
WHERE title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
)
,GETUTCDATE()
,hrr.createduserid
,0
FROM hc_req_resume hrr
WHERE hrr.sourceid = 4
AND hrr.addedtype = 10
AND hrr.resid IN
(
SELECT hrb.rid
FROM hc_resume_bank hrb
WHERE hrb.sourceemailid <> ''
)
AND hrr.rid IN
(
SELECT hrrss.reqresid
FROM hc_req_resume_stage_status hrrss
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrrss.statusdate >= DATEADD(MI, -5, GETUTCDATE())
)
AND hrr.sourceid IN (SELECT hrs.rid FROM hcm_resume_source hrs)
AND hrr.rid IN (SELECT hrris.reqresid FROM hc_req_res_interview_stages);
The naming of the columns doesn't make things easier here. Why is the column sometimes called rid and sometimes reqresid? And then I see a rid combined with a resid. Is this just yet another name for the same thing? Or are there two meanings of a rid? And what is the table called the ID actually refers to? Is there a table called r or reqres or res? It doesn't seem so, but why does the ID of the table have a different name from the table, so the reader must guess what is what? We cannot even make much of a guess, if it is possible for a rid not to have a match in hc_req_res_interview_stages or for a sourceid not to have a match in hcm_resume_source. Usually you have a foreign key constraint on IDs, so either the ID is null (if this is allowed) or it does have a match in the parent table. A lookup would we pointless. Is it in your query? Or arent those tables the parent tables, but just other child tables refering to the same parent?
Remove any lookups that are not needed. The lookups in hcm_resume_source and hc_req_res_interview_stages may be such candidates, but I cannot know.
At last you want appropriate indexes. For hc_req_resume this may be something like
create index idx1 on hc_req_resume (sourceid, addedtype, rid, resid);
Then you may want:
create index idx2 on hc_resume_bank (rid) where sourceemailid <> '';
create index idx3 on hc_req_resume_stage_status (stageid, statusid, statusdate, reqresid);
The order of the columns in the indexes should be adjusted according to their selectivity.
CodePudding user response:
Salmon A has a great answer that I'd like to expand on.
Similar to why Salman A suggested you move the function to the right side of your where clause for hrrss.StatusDate
, the same applies to SourceEmailId
, as putting a function on the left prevents the use of an index on these columns.
However, ISNULL()
is a bit more tricky to resolve, and there are several possible ways it could be addressed.
- Consider if the column should really allow NULLS, and if altering the column to not allow NULLS is an option. Then your where clause would look like this.
AND hrb.SourceEmailId <> ''
AND hrr.SourceEmailId <> ''
- It's also possible that SourceEmailId is always ether going to have a valid value, or be NULL. This would be preferred, as NULL should be used where a value is unknown. In which case you shouldn't be checking for
<> ''
. Simply check that emailIS NOT NULL
.
AND hrb.SourceEmailId IS NOT NULL
AND hrr.SourceEmailId IS NOT NULL
- If option 1 and 2 are not an option, then consider a
UNION
result set. In this case, you'd write a query forhrb.SourceEmailId <> ''
andUNION
that to the results of a second query forhrb.SourceEmailId IS NOT NULL
. Since you have checks forSourceEmailId
on two different tables, it could mean as meany as four queries. However, don't get caught up on the fact it's more queries, and that that would somehow mean it'll be slower. If all 4 queries are properly tuned, and each run in 100ms, that's better than one combined query running in 5 minutes.
More details of the issues and possible work around to using ISNULL() can be found in the below links.
- isnull-around-the-predicate-and-sargability
- What are different ways to replace ISNULL() in a WHERE clause that uses only literal values?
Once these changes have been applied, you'll have a query that can actually use indexes on these columns. At that point, I'd start reviewing your execution plans and indexes, and possibly looking at removing the DISTINCT
. But, as long as you have several WHERE
clauses in your query that are going to force a SCAN
every time they execute, doing these things now won't yield much benefit.
CodePudding user response:
You search for a result in the future, is this correct? -Edit: i realised its just the last 5 minutes you are looking for so in this case you might just as well remove the function on the left and see if this prevents the index scan.
About the slow performance. your query (only focussing on the datediff here) is not sargable this way. SQL server will need compute the column in all the rows on the table first, always resulting in a table scan. Remove the function on the left side.
One way to get around this, is to get the results from the main table first in a sargable way, put in it a temptable and then use the temptable for the function and use its ids to get back to the maintable for the results. See below example.
IF OBJECT_ID('tempdb..#MyTableName') IS NOT NULL
DROP TABLE #MyTableName
CREATE TABLE #MyTableName
(
PK INT PRIMARY KEY IDENTITY (1,1) NOT NULL,
ID INT,
StatusDate DATETIME
)
INSERT INTO #MyTableName (ID,StatusDate )
SELECT
ID,StatusDate
FROM dbo.basetable p
WHERE p.StatusDate > GETUTCDATE() --narrow your date criteria as much as needed
GO
SELECT P.* FROM #MyTableName T
JOIN dbo.basetable P
ON P.Id = T.ID
WHERE Datediff(MI, dateadd(mi, 330, T.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <= 5
OPTION (RECOMPILE)
;
If you can create a nonclustered index on your date column and see what it brings. In the way you wrote it, it will always scan the table but at least it has an index. In the sargable way that index will also help a bunch.