I have a table called AuditLogs that has a log of every single user action and their timestamp.
All these timestamp logs are recorded in UTC with a time offset of zero. While the actual users performing these actions are in California, with UTC-8.
I am trying to write a report with this data, and all the data needs to be in Californa time, to correctly contextualize when users are performing certain actions.
This is what my query looks like
select distinct top 1000
Users.FirstName ' ' Users.LastName Name
,AuditRecords.username
,AuditRecords.subtype
,convert(varchar, timestamp at time zone 'Pacific Standard Time', 0) time
,timestamp
from
AuditRecords
join Users on AuditRecords.UserId = Users.Id
where
AuditRecords.subtype <> 'log%'
and DATEPART(dy,timestamp at time zone 'Pacific Standard Time') = datepart(dy,SYSDATETIMEOFFSET() at time zone 'Pacific Standard Time')-1
With the key line being
and DATEPART(dy,timestamp at time zone 'Pacific Standard Time') = datepart(dy,SYSDATETIMEOFFSET() at time zone 'Pacific Standard Time')-1
With that line, I am trying to retrieve all records from the previous date, not based on a rolling 24 hours. For example, if the report is run on November 9th, it doesn't matter if the report was run at 6 AM or 6 PM, it will return all records for November 8th.
The problem is, if I change that line to be without the
at time zone 'Pacific Standard Time'
in both places, the query runs a lot faster. However, when the time is adjusted to the correct timezone with that line, then the query becomes a lot slower.
Is there any more efficient way to do this?
Thanks a lot for any advice.
CodePudding user response:
Example:
Declare @startDate datetimeoffset = dateadd(day, datediff(day, 0, sysdatetime()) - 1, 0) At Time Zone 'Pacific Standard Time'
Select @startDate
, @startDate At Time Zone 'UTC'
select distinct top 1000
Users.FirstName ' ' Users.LastName Name
,AuditRecords.username
,AuditRecords.subtype
,convert(varchar, timestamp at time zone 'Pacific Standard Time', 0) time
,timestamp
from
AuditRecords
join Users on AuditRecords.UserId = Users.Id
where
AuditRecords.subtype <> 'log%'
and timestamp >= @startDate At Time Zone 'UTC'
and timestamp < dateadd(day, 1, @startDate) At Time Zone 'UTC'
Also - don't use functions on columns in a where clause, that blocks SQL Server from being able to utilize an index on that column.
One more thing - the logic you have wouldn't work on Jan 1st, since day of year would be 1 and 1 - 1 would equal 0 so no matches.
You also want to be careful defining the start date. If you try using UTC you could end up running the report for the wrong day, depending on when it is run and the offset for the instance where it is executed.
CodePudding user response:
We've narrowed the performance issue down to this expression:
DATEPART(dy,timestamp at time zone 'Pacific Standard Time') =
datepart(dy,SYSDATETIMEOFFSET() at time zone 'Pacific Standard Time')-1
The question also tells us that data is stored in UTC, and we know (via a comment) that the server runs in NY.
First of all, this is poor practice. Your servers should run using the same timezone as your data. Many people will tell you this means always keep everything UTC, but I don't take it to quite that extreme. If you're business is in entirely in California, and your data is stored in a datetime field in California time, it can be appropriate to run your servers using the same time zone. But if your data is stored in UTC, your database server should run UTC time, too, regardless of where the server is located. Fixing that (and I say "fix", because what you have really is broken) would entirely fix the issue in the question.
But expecting making that fix will be out of your hands, we can still help things somewhat.
Whenever you have a WHERE
conditional expression where one side involves a column from a table and the other side does not, you will always get MUCH better performance making ALL the adjustments to the side of the expression that does not reference any column. It's not close... we're talking multiple orders of magnitude difference.
That means you want to do whatever it takes to get the expression in question down to looking something like this:
timestamp = SOME_COMPLICATED_AND_LONG_EXPRESSION_HERE( SYSDATETIMEOFFSET() )
The big win you gain from this is the ability to use indexes from your timestamp
column. If you have to make adjustments to the timestamp
side of the expression, you're no longer working with the same value as the index. That cuts to the core of database performance, and it will make a night and day difference.
Worse, if you are not able to do this, any manipulation done to the timestamp
column must be computed for every row in the table... even rows you don't need, because the server can't know if a given value will be used or not until it's completed the conversion.
In this case you may not be able to avoid the DATEPART(dy...)
part of the expression. But you can make the adjustment from NY time to UTC time for the right side of the expression. This will let you skip the at time zone
part on the left side, which will be an immense help on it's own. It's worth noting here that thanks to government meddling over time, these time zone adjustment lookups are much more complicated than you may be ready to believe. Getting this down to one total time zone adjustment can greatly improve the query over needing to do the adjustment for every single row.