I have a table where historical data is stored. The sample structure of the table is like,
id | config | userid | team | createdon |
---|---|---|---|---|
1 | "config-01" | "user-01" | "Dev" | "2022:08:28" |
2 | "config-02" | "user-01" | "Dev" | "2022:08:27" |
3 | "config-01" | "user-01" | "Dev" | "2022:08:26" |
4 | "config-01" | "user-01" | "QA" | "2022:08:25" |
5 | "config-01" | "user-01" | "QA" | "2022:08:24" |
6 | "config-20" | "user-01" | "QA" | "2022:08:23" |
7 | "config-01" | "user-01" | "QA" | "2022:08:22" |
8 | "config-01" | "user-01" | "Prod" | "2022:08:21" |
9 | "config-01" | "user-01" | "Prod" | "2022:08:20" |
10 | "config-11" | "user-02" | "Prod" | "2022:08:19" |
11 | "config-15" | "user-02" | "Prod" | "2022:08:18" |
Now, I need to query data based on user-id, so first 9 rows will be taken for user-id "user-01". From these 9 rows I need to take latest two rows(based on createdon) of each team. So the table as of now looks like this,
id | config | userid | team | createdon |
---|---|---|---|---|
1 | "config-01" | "user-01" | "Dev" | "2022:08:28" |
2 | "config-02" | "user-01" | "Dev" | "2022:08:27" |
4 | "config-01" | "user-01" | "QA" | "2022:08:28" |
5 | "config-01" | "user-01" | "QA" | "2022:08:27" |
8 | "config-01" | "user-01" | "Prod" | "2022:08:28" |
9 | "config-01" | "user-01" | "Prod" | "2022:08:27" |
Now I need to compare two rows of each team against their config column and say if there is a change between them. So the final response I am looking for will be,
userid | team | configchanged |
---|---|---|
"user-01" | "Dev" | TRUE |
"user-01" | "QA" | FALSE |
"user-01" | "Prod" | FALSE |
As of now I was able to reach at a point where I can see rows with same team together using this self join query,
select * from table t1, table t2
where t1.id <> t2.id and t1.team = t2.team and t1.userid = "user-01"
I need help on rest of the part.
Any help on generating a query to achieve this solution would be really helpful. Thank You
CodePudding user response:
Not sure why this is tagged both with MySQL and Presto, but here is approach for Presto/Trino.
You can use window functions (lag
in particular to compare "current" and "previous" records) and group by. Here is a little bit more generic approach (without filtering, but that is easy to add):
-- sample data
with dataset (id, config, userid, team, createdon) as (
values (1 , 'config-01', 'user-01', 'Dev', '2022:08:28'),
(2 , 'config-02', 'user-01', 'Dev', '2022:08:27'),
(3 , 'config-01', 'user-01', 'Dev', '2022:08:26'),
(4 , 'config-01', 'user-01', 'QA', '2022:08:25'),
(5 , 'config-01', 'user-01', 'QA', '2022:08:24'),
(6 , 'config-20', 'user-01', 'QA', '2022:08:23'),
(7 , 'config-01', 'user-01', 'QA', '2022:08:22'),
(8 , 'config-01', 'user-01', 'Prod','2022:08:21'),
(9 , 'config-01', 'user-01', 'Prod','2022:08:20'),
(10 , 'config-11', 'user-02', 'Prod','2022:08:19'),
(11 , 'config-15', 'user-02', 'Prod','2022:08:18')
)
-- query
select userid,
team,
max_by(configchanged, createdon) configchanged -- get last change status
from (
select *,
config != coalesce(lag(config) over (partition by userid, team order by createdon), config) configchanged -- coalesce is used just in case there is only one row
from dataset
)
group by userid, team
Output:
userid | team | configchanged |
---|---|---|
user-01 | QA | false |
user-01 | Prod | false |
user-01 | Dev | true |
user-02 | Prod | true |