Looking to find a way to clean up some data in a temporal table where bad Deletes/Inserts happened throughout the years. I was able to use NORMALIZE when the period values overlapped/met, however, I am now running into an issue where the periods do not overlap/meet. I would like to consolidate the periods.
Before:
After:
My original plan was to take the min(begin(eff_vt_dttm)) and max(end(eff_vt_dttm)) and create a new period after grouping by all columns. Problem is the edge case where the row values repeat, for example rows 1 and 2 should consolidate, but don't want rows 5 and 6 to consolidate with rows 1 and 2. Rows 1 and 2 should consolidate into their own row, rows 5 and 6 should consolidate into their own row as well. I was thinking of creating groupings so that I can take the min/max of the group, but had a hard time.
Grouping Idea:
Wondering if anyone knows a good approach for this problem.
Thanks!
CodePudding user response:
This is untested, for the reasons noted in my comment on your post.
It uses a concept called gaps-and-islands to identify consecutive records where the values remain unchanged. These form you 'groups' in the final image.
WITH
sorted AS
(
SELECT
your_data.*,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY BEGIN(eff_vt_dttm)
)
AS seq_num_id,
ROW_NUMBER() OVER (PARTITION BY id, col1, col2
ORDER BY BEGIN(eff_vt_dttm)
)
AS seq_num_id_col1_col2
FROM
your_data
)
SELECT
s.id,
s.col1,
s.col2,
PERIOD(
MIN(begin(s.eff_vt_dttm)),
MAX(end(s.eff_vt_dttm))
)
AS eff_vt_dttm
FROM
sorted AS s
GROUP BY
s.id,
s.col1,
s.col2,
s.seq_num_id - s.seq_num_id_col1_col2
ORDER BY
s.id,
MIN(begin(s.eff_vt_dttm))