Home > Back-end >  Is it safe to use a table's primary key to determine order between SQL rows?
Is it safe to use a table's primary key to determine order between SQL rows?

Time:08-02

I have an SQL table with a primary key and a created_at timestamp column. Given that I have the primary key value of a specific row, and that I want to pull all of the rows created before (specifically the row created just before), is it safe to do this by querying all rows that have an ID that is less than the ID of the row under inspection, instead of comparing using the timestamp column?

EDIT: Are there any scenarios (regardless of how contrived) in which this might actually be a problem?

CodePudding user response:

Are there any scenarios (regardless of how contrived) ...

You don't specify DBMS. In SQL Server you might be talking about a column with IDENTITY property and a column with default of SYSUTCDATETIME - The value of SYSUTCDATETIME is determined in a "pre-execution" phase as it needs to be "runtime constant" for a query - certainly it is possible to imagine that Q1 inserts a million rows from a slow source with a blocking operator in the plan (like a sort) - meanwhile Q2 starts up and gets a later timestamp and inserts a single row before Q1 gets issued its identity values.

Indeed I was able to reproduce this exact scenario very easily with

CREATE TABLE dbo.Test(
ID INT IDENTITY PRIMARY KEY, 
TS DATETIME2 DEFAULT SYSUTCDATETIME(),
ZZ CHAR(1) NULL
)

Query 1

INSERT dbo.Test(ZZ)
SELECT TOP 1000000 LEFT(ROW_NUMBER() OVER (ORDER BY c1.column_id, c2.column_id ),1)
FROM sys.all_columns c1,sys.all_columns c2

Query 2 (Executed immediately after Query 1 starts running)

INSERT dbo.Test(ZZ) DEFAULT VALUES;

And top few results ordered by ID after both queries had finished (note row 1 has a later timestamp but earlier ID than the others)

 ---- ----------------------------- ------ 
| ID |             TS              |  ZZ  |
 ---- ----------------------------- ------ 
|  1 | 2022-08-01 15:28:23.3976093 | NULL |
|  2 | 2022-08-01 15:28:20.6515274 | 1    |
|  3 | 2022-08-01 15:28:20.6515274 | 2    |
|  4 | 2022-08-01 15:28:20.6515274 | 3    |
|  5 | 2022-08-01 15:28:20.6515274 | 4    |
 ---- ----------------------------- ------ 

CodePudding user response:

No. Don't use the primary key for determining the age of the records. You may currently have a primary key that is monotonically increasing, but you should not rely on that. You should not use the value of the primary key for anything other than looking up a record by that key.

Maybe in the future your primary key will be a random number.

Use the timestamp column.

  •  Tags:  
  • sql
  • Related