I have following table, which apart from other attributes contains:
- ID - unique identifier
- Column1
- Column2
- Column3
- CreatedDate - when the record has been created (based on ETL)
- UpdatedDate - until when the record has been valid
Since there are other attributes apart from the 3 columns, which are being tracked for historical values, there might be cases, where there are multiple rows with the same values for all three columns for the same ID, but different timestamps in [CreatedDate] / [UpdatedDate]. Thus, the data may look like:
ID | Column1 | Column2 | Column3 | CreatedDate | UpdatedDate |
---|---|---|---|---|---|
1122 | T1 | In Progress | NULL | 02/02/2022 18:39:38 | 29/03/2022 14:25:24 |
1122 | T1 | In Progress | NULL | 05/01/2022 10:45:50 | 02/02/2022 18:39:38 |
1122 | T1 | In Progress | NULL | 03/01/2022 12:11:47 | 05/01/2022 10:45:50 |
1122 | T1 | In Progress | Yes | 13/12/2021 21:43:44 | 03/01/2022 12:11:47 |
1122 | T1 | In Progress | NULL | 17/02/2021 14:12:15 | 13/12/2021 21:43:44 |
1122 | T1 | In Progress | NULL | 22/12/2020 14:38:32 | 17/02/2021 14:12:15 |
1122 | T1 | In Progress | NULL | 17/12/2020 18:38:38 | 22/12/2020 14:38:32 |
1122 | T3 | Ready | NULL | 30/03/2020 14:35:18 | 17/12/2020 18:38:38 |
1122 | NULL | Ready | NULL | 04/09/2019 18:33:24 | 30/03/2020 14:35:18 |
1122 | T2 | Ready | NULL | 07/01/2019 11:07:39 | 04/09/2019 18:33:24 |
1122 | T2 | Ready | NULL | 17/09/2018 14:31:17 | 07/01/2019 11:07:39 |
1122 | T0 | Ready | NULL | 28/08/2018 14:31:39 | 17/09/2018 14:31:17 |
1122 | T0 | Ready | NULL | 13/02/2018 14:48:44 | 28/08/2018 14:31:39 |
I would like to keep the unique values for all 3 columns in correct sequence, hence the ideal output should look like:
ID | Column1 | Column2 | Column3 | CreatedDate | UpdatedDate |
---|---|---|---|---|---|
1122 | T1 | In Progress | NULL | 03/01/2022 12:11:47 | 29/03/2022 14:25:24 |
1122 | T1 | In Progress | Yes | 13/12/2021 21:43:44 | 03/01/2022 12:11:47 |
1122 | T1 | In Progress | NULL | 17/12/2020 18:38:38 | 13/12/2021 21:43:44 |
1122 | T3 | Ready | NULL | 30/03/2020 14:35:18 | 17/12/2020 18:38:38 |
1122 | NULL | Ready | NULL | 04/09/2019 18:33:24 | 30/03/2020 14:35:18 |
1122 | T2 | Ready | NULL | 17/09/2018 14:31:17 | 04/09/2019 18:33:24 |
1122 | T0 | Ready | NULL | 13/02/2018 14:48:44 | 17/09/2018 14:31:17 |
The below code works fine, if there is just one column, but it does not work for multiple columns as it returns all unique rows.
select ID, Column1, Column2, Column3, min(createddate), max(updateddate)
from (select t.*,
sum(case when prev_updatedate >= createddate then 0 else 1 end) over (partition by ID order by createddate) as grp
from (select h.*,
max(updateddate) over (partition by ID order by createddate rows between unbounded preceding and 1 preceding) as prev_updatedate
from #history h
) h
) h
group by ID, Column1, Column2, Column3, grp;
Any ideas how to fix it, please?
CodePudding user response:
You can try to use ROW_NUMBER
window function to make the gap with your logic then you might get a grouping for gaps-and-islands
SELECT ID,Column1,Column2,Column3, min(createddate) CreatedDate, max(updateddate) UpdatedDate
FROM (
select *,
ROW_NUMBER() over (partition by ID order by createddate) -
ROW_NUMBER() over (partition by ID,Column1,Column2,Column3 order by createddate) grp
from history
) t1
GROUP BY grp,ID,Column1,Column2,Column3
ORDER BY CreatedDate DESC