I'm attempting to return only the modified records in a table, using T-SQL.
This is what I've done thus far:
BEGIN
IF OBJECT_ID('tempdb..#Test') IS NOT NULL
DROP TABLE #Test
CREATE TABLE #Test
(SetName nvarchar(100),
[Timestamp] datetime,
Value smallint)
INSERT INTO #Test VALUES('Alpha', GETDATE(), 1)
INSERT INTO #Test VALUES('Alpha', GETDATE(), 0)
INSERT INTO #Test VALUES('Alpha', GETDATE(), 1)
INSERT INTO #Test VALUES('Beta', GETDATE(), 1)
INSERT INTO #Test VALUES('Beta', GETDATE(), 1)
INSERT INTO #Test VALUES('Beta', GETDATE(), 1)
INSERT INTO #Test VALUES('Gamma', GETDATE(), 1)
INSERT INTO #Test VALUES('Gamma', GETDATE(), 0)
INSERT INTO #Test VALUES('Gamma', GETDATE(), 1)
SELECT * FROM #Test
END;
Results:
SetName Timestamp Value
Alpha 2022-05-23 12:58:41.100 1
Alpha 2022-05-23 12:58:41.101 0
Alpha 2022-05-23 12:58:41.102 1
Beta 2022-05-23 12:58:41.103 1
Beta 2022-05-23 12:58:41.104 1
Beta 2022-05-23 12:58:41.105 1
Gamma 2022-05-23 12:58:41.106 1
Gamma 2022-05-23 12:58:41.107 0
Gamma 2022-05-23 12:58:41.108 1
Expected results:
Alpha 2022-05-23 12:58:41.101 0 -- Changed from 1 to 0
Alpha 2022-05-23 12:58:41.102 1 -- Changed from 0 to 1
Gamma 2022-05-23 12:58:41.107 0 -- Changed from 1 to 0
Gamma 2022-05-23 12:58:41.108 1 -- Changed from 0 to 1
The following statement returns all the 0 to 1, and 1 to 0 records and I don't understand why:
;WITH cte AS
(
SELECT
SetName, [Timestamp], Value, lag(Value, 1, -1) OVER (ORDER BY [Timestamp]) AS LastValue
FROM #Test
)
SELECT
SetName, [Timestamp], [Value]
FROM
cte
WHERE value <> LastValue
CodePudding user response:
This solution assumes that 2 rows for a single SetName
cannot have the same value for Timestamp
(2 rows for different values of SetName
is fine).
One method is to use gaps and islands to put the values into groups, and then omit group 0 from the data:
WITH CTE AS(
SELECT SetName,
[Timestamp], --timestamp is a deprecrated synonym for rowversion; I don't recommend it's use as a column name
[value],
ROW_NUMBER() OVER (PARTITION BY SetName ORDER BY Timestamp) -
ROW_NUMBER() OVER (PARTITION BY SetName, [Value] ORDER BY Timestamp) AS Grp
FROM #Test)
SELECT SetName,
[Timestamp], --timestamp is a deprecrated synonym for rowversion; I don't recommend it's use as a column name
[value]
FROM CTE
WHERE Grp != 0
ORDER BY [Timestamp];
If 2 rows can have the same value, then hopefully you have an always ascending value you can use in the ORDER BY
as well; otherwise you have no way of knowing what the "first" row is.
CodePudding user response:
To track the changes at whole dataset, you were almost there, you just need to remove the first row by using LastValue <> -1
.
WITH CTE AS
(
SELECT
SetName, [Timestamp], Value, lag(Value, 1, -1) OVER (ORDER BY [Timestamp]) AS LastValue
FROM #Test
)
SELECT
SetName, [Timestamp], [Value]
FROM
CTE
WHERE value<>Lastvalue
AND Lastvalue<> -1; --Add this filter to remove the rows which doesn't have any Lag Value
To track the changes at SetName
level.
WITH CTE AS
(
SELECT
SetName, [Timestamp], Value, Lag(Value, 1, -1) OVER (PARTITION BY SetName ORDER BY [Timestamp]) AS LastValue
FROM #Test
)
SELECT
SetName, [Timestamp], [Value]
FROM
CTE
WHERE Value<>LastValue
AND LastValue <> -1;
db<>fiddle: Try here
Note: From the given dataset, modified records at SetName level and at whole dataset is same.
Assumption: Each Timestamp value is different.