SELECT data first then UPDATE using clustered index or UPDATE directly?-CodePudding

There's table (in SQL SERVER) which have high concurrent access (Ex: Transactions), but I need to update some data in this table. But when filtering data that needed to update, there's no indexes associated with that column(s).

What would be the best approach for minimum Table/ Rows lock time?

Approach 1:

DECLARE @vt_TxnIds TABLE
(
    [Id] INT
)

/** Filter the required data first **/
INSERT INTO @vt_TxnIds
SELECT TXN.[Id]
FROM [Transactions] TXN WITH (NOLOCK) -- NOLOCK is fine in this case
LEFT JOIN [Account] ACC WITH (NOLOCK) 
    ON ACC.[Id] = TXN.[AccountId] AND
        ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
    AND ACC.[Id] IS NULL

/** Then update by clustered Index **/
BEGIN TRAN; 

UPDATE [Transactions]
    SET [Status] = 5
WHERE [Id] IN ( -- [Id] is clustered index
    SELECT [Id]
    FROM @vt_TxnIds
)

COMMIT;

Approach 2:

BEGIN TRAN;

UPDATE TXN
    SET TXN.[Status] = 5
FROM [Transactions] TXN
LEFT JOIN [Account] ACC WITH (NOLOCK)
    ON ACC.[Id] = TXN.[AccountId] AND
        ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
    AND ACC.[Id] IS NULL    

COMMIT;

I'm not considering about the execution time. For example in my case, it's okay that whole query take 15 seconds but table/ rows locked for 5 seconds. Rather than whole table locked for 10 seconds and query also take 10 seconds.

Could someone please suggest what's the best approach or any alternative approach that full-fill my requirement?

Many thanks!

Update: Creating new index is not an option.

CodePudding user response：

Presumably this update is required for your application to function correctly. When dealing with an overzealous datababase administrator (I didn't say "incompetent", did I? :-) you, the developer, get the application right and leave the DBA to sort out the performance and table-locking problems. They can always add an index later when your production code gets slow. To which you say "hey, good idea!" presuming they ask you.

The same logic holds true for NOLOCK. The DBA can tell you if that's necessary. (It probably isn't.) Leave it out of your work.

Your objective here is to minimize the time during which a table is locked, as you said. Your secondary objective is to minimize the number of rows involved in any particular UPDATE operation.

You can do that, in SQL Server, by using TOP (n) to control the number of rows. That means you do multiple UPDATEs and keep going until the job is done. This kind of thing will work. (not debugged.)

SET @batchsize = 100;
SET @count = 1;
WHILE @count > 0 BEGIN
  SET DEADLOCK_PRIORITY LOW;

  UPDATE TOP (@batchsize) TXN
         SET TXN.[Status] = 5
   FROM [Transactions] TXN
   LEFT JOIN [Account] ACC 
        ON ACC.[Id] = TXN.[AccountId] AND ACC.[IsActive] = 0
 WHERE TXN.[Status] = 1 
    AND ACC.[Id] IS NULL;

 SET @count = @@ROWCOUNT;

END;

This works because your UPDATE sets Transactions.Status to 5. Once a row has been updated, that same row won't be chosen again for update.

Setting the deadlock priority to low is a good idea for this sort of repeating operation. If somehow your update query causes a deadlock with other app software it tells SQL server to stop your query rather than others. Stopping your query doesn't matter: your update will catch the same rows the next time it runs.

Now, obviously, this doesn't update the whole table in a single ACID transaction. Instead it's each batch. But I suspect that will not damage your application, or your transactional code would have done the update in real time.

CodePudding user response：

The first option is pointless extra work, and does not conform to ACID properties.

The unmentioned Approach #3 is best:

Approach #2 is good as a starting point
Remove the transaction as it is only a single statement
Remove the NOLOCK hint as that will just cause incorrect results and weird errors
Convert the left-join to a NOT EXISTS which is often more efficient.

UPDATE TXN
    SET TXN.Status = 5
FROM Transactions TXN
WHERE TXN.Status = 1
  AND NOT EXISTS (SELECT 1
      FROM Account ACC
      WHERE ACC.Id = TXN.AccountId
        AND ACC.IsActive = 0
  );

For this to work efficiently, you will want indexes (either clustered or non-clustered)

TXN (Status, AccountId)
ACC (IsActive, Id)

Alternatively you can use filtered non-clustered indexes

TXN (AccountId) INCLUDE (Status) WHERE (Status = 1)
ACC (Id) INCLUDE (IsActive) WHERE (IsActive = 0)

If you want to prevent a lot of locking and/or you cannot add the indexes, you can do the update in a loop on a few rows at a time.

Note that a transaction is not used here, to prevent excessive locking. Obviously you cannot roll back each run of the loop once finished.

DECLARE @batchSize bigint = 1000;

WHILE (1=1)
BEGIN

    UPDATE TOP (@batchSize) TXN
        SET TXN.Status = 5
    FROM Transactions TXN
    WHERE TXN.Status = 1
      AND NOT EXISTS (SELECT 1
          FROM Account ACC
          WHERE ACC.Id = TXN.AccountId
            AND ACC.IsActive = 0
      );

    IF @@ROWCOUNT < @batchSize
        BREAK;

    WAITFOR DELAY '00:00:05';  -- or some other wait time

END;