Home > Mobile >  SQL - gap and island issue for more attributes
SQL - gap and island issue for more attributes

Time:03-29

I have following table, which apart from other attributes contains:

  • ID - unique identifier
  • Column1
  • Column2
  • Column3
  • CreatedDate - when the record has been created (based on ETL)
  • UpdatedDate - until when the record has been valid

Since there are other attributes apart from the 3 columns, which are being tracked for historical values, there might be cases, where there are multiple rows with the same values for all three columns for the same ID, but different timestamps in [CreatedDate] / [UpdatedDate]. Thus, the data may look like:

ID Column1 Column2 Column3 CreatedDate UpdatedDate
1122 T1 In Progress NULL 02/02/2022 18:39:38 29/03/2022 14:25:24
1122 T1 In Progress NULL 05/01/2022 10:45:50 02/02/2022 18:39:38
1122 T1 In Progress NULL 03/01/2022 12:11:47 05/01/2022 10:45:50
1122 T1 In Progress Yes 13/12/2021 21:43:44 03/01/2022 12:11:47
1122 T1 In Progress NULL 17/02/2021 14:12:15 13/12/2021 21:43:44
1122 T1 In Progress NULL 22/12/2020 14:38:32 17/02/2021 14:12:15
1122 T1 In Progress NULL 17/12/2020 18:38:38 22/12/2020 14:38:32
1122 T3 Ready NULL 30/03/2020 14:35:18 17/12/2020 18:38:38
1122 NULL Ready NULL 04/09/2019 18:33:24 30/03/2020 14:35:18
1122 T2 Ready NULL 07/01/2019 11:07:39 04/09/2019 18:33:24
1122 T2 Ready NULL 17/09/2018 14:31:17 07/01/2019 11:07:39
1122 T0 Ready NULL 28/08/2018 14:31:39 17/09/2018 14:31:17
1122 T0 Ready NULL 13/02/2018 14:48:44 28/08/2018 14:31:39

I would like to keep the unique values for all 3 columns in correct sequence, hence the ideal output should look like:

ID Column1 Column2 Column3 CreatedDate UpdatedDate
1122 T1 In Progress NULL 03/01/2022 12:11:47 29/03/2022 14:25:24
1122 T1 In Progress Yes 13/12/2021 21:43:44 03/01/2022 12:11:47
1122 T1 In Progress NULL 17/12/2020 18:38:38 13/12/2021 21:43:44
1122 T3 Ready NULL 30/03/2020 14:35:18 17/12/2020 18:38:38
1122 NULL Ready NULL 04/09/2019 18:33:24 30/03/2020 14:35:18
1122 T2 Ready NULL 17/09/2018 14:31:17 04/09/2019 18:33:24
1122 T0 Ready NULL 13/02/2018 14:48:44 17/09/2018 14:31:17

The below code works fine, if there is just one column, but it does not work for multiple columns as it returns all unique rows.

select ID, Column1, Column2, Column3,  min(createddate), max(updateddate)
from (select t.*,
             sum(case when prev_updatedate >= createddate then 0 else 1 end) over (partition by ID order by createddate) as grp
      from (select h.*,
                   max(updateddate) over (partition by ID order by createddate rows between unbounded preceding and 1 preceding) as prev_updatedate
            from #history h
           ) h
     ) h
group by ID, Column1, Column2, Column3, grp;

Any ideas how to fix it, please?

CodePudding user response:

You can try to use ROW_NUMBER window function to make the gap with your logic then you might get a grouping for gaps-and-islands

SELECT ID,Column1,Column2,Column3, min(createddate) CreatedDate, max(updateddate) UpdatedDate
FROM ( 
  select *,
      ROW_NUMBER() over (partition by ID order by createddate) - 
      ROW_NUMBER() over (partition by ID,Column1,Column2,Column3 order by createddate) grp
  from history
) t1
GROUP BY grp,ID,Column1,Column2,Column3
ORDER BY CreatedDate DESC

sqlfiddle

  • Related