Home > front end >  How to set seed value and incremental value for identity column in Azure Synapse?
How to set seed value and incremental value for identity column in Azure Synapse?

Time:02-21

I have a table with DDL as below in the Azure Synapse Data Warehouse:

CREATE TABLE [trans_customer_cdm_ejkb].[cdm_file_process_history]
(
    [id] [int] IDENTITY(1,1) NOT NULL,
    [layer] [varchar](500) NOT NULL,
    [ingest_partition] [varchar](100) NULL,
    [status] [varchar](25) NULL,
    [last_update_time] [datetime2](7) NULL,
    [pipeline_run_id] [varchar](200) NULL
)
WITH
(
    DISTRIBUTION = HASH ( [ingest_partition] ),
    CLUSTERED COLUMNSTORE INDEX
)
GO

I tried to append values into this table 3 times with the sql below, the id column has a seed value of 45 and incremental value of 60 inspite of the IDENTITY(1,1).

DECLARE @now DATETIME2 = GETDATE()

INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(layer, ingest_partition, [status], last_update_time, pipeline_run_id)
VALUES ('src2stg', '2022-02-18-03', 'success', @now, 'Test')

SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history

enter image description here Also, I check the seed value as 1 and incremental value as 1 with the SQL below. However, the table does not provide the id value as expected

SELECT  sm.name
,       tb.name
,       co.name
,       ic.seed_value
,       ic.increment_value
FROM        sys.schemas AS sm
JOIN        sys.tables  AS tb           ON  sm.schema_id = tb.schema_id
JOIN        sys.columns AS co           ON  tb.object_id = co.object_id
JOIN        sys.identity_columns AS ic  ON  co.object_id = ic.object_id
                                        AND co.column_id = ic.column_id
WHERE   sm.name = 'trans_customer_cdm_ejkb'
AND     tb.name = 'cdm_file_process_history'
;

enter image description here

How can I fix this issue ?

Kind regards,

CodePudding user response:

IDENTITY columns in Azure Synapse Analytics dedicated SQL pools do guarantee unique values but do not guarantee sequential values. The reason is because data is split across 60 distributions; each distribution has a unique set of identity values.

If it's important for you to have a sequential column then recreate the table without the IDENTITY property and change your INSERT statement to the following code which will produce a sequential ID:


DECLARE @now DATETIME2 = GETDATE()

INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(id, layer, ingest_partition, [status], last_update_time, pipeline_run_id)
SELECT
(SELECT ISNULL(MAX(id),0) FROM trans_customer_cdm_ejkb.cdm_file_process_history)  1 as id, 
 'src2stg', '2022-02-18-03', 'success', @now, 'Test'

SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history

Since your code was just inserting a single row I did 1 but normally you would do ROW_NUMBER() OVER (ORDER BY [SomeColumn]).

  • Related