I have a table users, I want to track their valid from and valid to dates for each profile user worked on and last valid to as null for the existing profile.
select id,email,ROLE,PROFILE,
min_last_modified_date as valid_from, lead(min_last_modified_date) over (partition by
id order by min_last_modified_date) as valid_to
from
(
select id,email,ROLE,PROFILE,
min(last_modified_date) as min_last_modified_date,
max(last_modified_date) as max_last_modified_date
from users_table
group by 1,2,3,4
)
[sample table]
ID | ROLE | EMPLOYEE_NUMBER | PROFILE | LAST_MODIFIED_DATE |
---|---|---|---|---|
1 | Managers | U001 | General Manager | 2020-03-24 0:10:35 |
1 | Managers | U001 | General Manager | 2020-04-13 13:23:22 |
1 | Managers | U001 | General Manager | 2020-04-14 23:39:53 |
1 | Managers | U001 | General Manager | 2020-05-04 17:06:57 |
1 | Managers | U001 | General Manager | 2020-06-30 0:21:23 |
1 | Managers | U001 | National Acquisition Lead | 2020-06-30 21:13:53 |
1 | Managers | U001 | General Manager | 2020-11-24 11:54:33 |
1 | Managers | U001 | General Manager | 2020-12-03 0:32:42 |
1 | Managers | U001 | General Manager | 2022-02-23 1:52:10 |
1 | Managers | U001 | General Manager | 2022-04-29 19:29:49 |
[getting output like this]
ID | ROLE | PROFILE | VALID_FROM | VALID_TO | |
---|---|---|---|---|---|
1 | abc | Outside Sales Managers | General Manager | 2020-03-24 0:10:35 | 2021-02-25 22:20:11 |
1 | abc | Outside Sales Managers | National Acquisition Lead | 2021-02-25 22:20:11 | Null |
[Expected output]
ID | ROLE | PROFILE | VALID_FROM | VALID_TO | |
---|---|---|---|---|---|
1 | abc | Outside Sales Managers | General Manager | 2020-03-24 0:10:35 | 2021-02-25 22:20:11 |
1 | abc | Outside Sales Managers | National Acquisition Lead | 2021-02-25 | 22:20:11 2021-03-31 22:39:50 |
1 | abc | Outside Sales Managers | General Manager | 2021-03-31 22:39:50 | NULL |
CodePudding user response:
One of the approaches can be to identify concurrent rows with same profile into a same bucket(i.e. group).
First distinguish each row based on the next profile and prev profile it encounters.
If the previous or next profile is null(i.e. first or last entry in the table) , default it to the current profile.
Then your window function can do the proper partitions.
select
distinct
ID
, email
, role
, profile
, min(last_modified_date) over (partition by ID, email, role, adjacent_role_ind order by last_modified_date) as valid_from
, lead(last_modified_date,1,null) over (partition by ID, email, role, adjacent_role_ind order by last_modified_date) as valid_to
from
(
select
ID
, email
, role
, profile
, last_modified_date
, sum(prev_profile) over ( order by last_modified_date) as adjacent_role_ind
from
(
select
ID
, email
, role
, profile
, last_modified_date
, case
when lag(profile, 1, 0) over (order by last_modified_date) != profile
then 1
else 0
end as prev_profile
from users
)
;
You can format the above query into a CTE for a clear code block and optimal execution.
CodePudding user response:
Here is a query adapted from the logic laid out in @shubh_8175's answer. I tried to edit their answer because the logic is sound and only small changes needed to be made to get the correct output but the queue is full
- Detect profile changes over time per id
- Group profile changes within id
- Get the valid dates within those profile groups
- Aggregate to remove the changes within profile groups
with cte as (
select
[id]
,[email]
,[role]
,[profile]
,[last_modified_date]
,case when lag([profile]) over (order by [last_modified_date]) = [profile] then 0 else 1 end as [new_profile]
from users_table
),
profile_grouping as (
select
[id]
,[email]
,[role]
,[profile]
,[last_modified_date]
,sum([new_profile]) over ( order by [last_modified_date]) as [profile_n]
from cte
),
valid_dates as (
select
[id]
,[email]
,[role]
,[profile]
,[last_modified_date]
,[profile_n]
,min(last_modified_date) over (partition by [id], [profile_n] order by [last_modified_date]) as [valid_from]
,lead(last_modified_date) over (partition by [id] order by [profile_n]) as [valid_to]
from profile_grouping
)
select
[id]
,[email]
,[role]
,[profile]
,[profile_n]
,min([valid_from]) [valid_from]
,case when [profile_n] = max([profile_n]) over ( partition by [id] ) then null else max([valid_to]) end [valid_to]
from valid_dates
group by
[id]
,[email]
,[role]
,[profile]
,[profile_n]
order by [id], [valid_from]