I've been searching many similar posts here but they seem not to meet my requirement to use SQL Server IDENTITY
.
I have a dataframe extracted from another data source which might include duplicate Name.
Dataframe
Date | Time | Name | Result |
---|---|---|---|
3/28/2022 | 7:00:00 | abc | 0.23 |
3/28/2022 | 7:00:00 | abc | 0.19 |
3/28/2022 | 7:05:00 | def | 0.36 |
3/28/2022 | 7:10:00 | ghi | 0.29 |
When I load to SQL Server, I need to create a Retest_Count column in order to indicate the test order. If the Name is duplicated and same timestamp, always set the newer one to be 0 and the older one to be 1 based on the 'original data order'. Like this:
SQL Table 1
Date | Time | Name | Result | Retest_Count |
---|---|---|---|---|
3/28/2022 | 7:00:00 | abc | 0.23 | 1 |
3/28/2022 | 7:00:00 | abc | 0.19 | 0 |
3/28/2022 | 7:05:00 | def | 0.36 | 0 |
3/28/2022 | 7:10:00 | ghi | 0.29 | 0 |
When I insert new data with same Name but different timestamp, always give 0 to the new data and the existing table need to get auto increment like this:
SQL Table 2
Date | Time | Name | Result | Retest_Count |
---|---|---|---|---|
3/28/2022 | 7:00:00 | abc | 0.23 | 2 |
3/28/2022 | 7:00:00 | abc | 0.19 | 1 |
3/29/2022 | 13:00:00 | abc | 0.18 | 0 |
3/28/2022 | 7:05:00 | def | 0.36 | 0 |
3/28/2022 | 7:10:00 | ghi | 0.29 | 1 |
3/29/2022 | 21:05:00 | ghi | 0.25 | 0 |
Is this possible to achieve by SQL Server script? Python solution is considered also.
CodePudding user response:
As I mentioned in the comments, this is far better suited for a VIEW
. You will need to ensure you have some kind of column in your table that always increases (such as an IDENTITY
) to achieve this though (as I note you have 2 tests that occur at the same time for the same Name, so there isn't a way to denote which is "first").
Then you can create a VIEW
like this:
CREATE VIEW dbo.YourView_RetestCounts AS
SELECT [Date],
[Time],
[Name],
Result,
ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY IdentityColumn DESC) - 1 AS Retest_Count
FROM dbo.YourTable;
If, however, your goal is to filter to the latest row, as your comments suggest, then use a VIEW
with a Get top 1 row of each group solution:
CREATE VIEW dbo.YourView_LatestTest AS
WITH CTE AS(
SELECT [Date],
[Time],
[Name],
Result,
ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY IdentityColumn DESC) AS Retest_Count
FROM dbo.YourTable)
SELECT [Date],
[Time],
[Name],
Result
FROM CTE
WHERE Retest_Count = 1;