So i am having an issue regarding a query to get the latest data based on status
For example. I have a table like these
Schema (PostgreSQL v12)
CREATE TABLE test(
id INTEGER,
user_id INTEGER,
product VARCHAR(20),
status VARCHAR(20)
);
INSERT INTO test VALUES(1,1,'WALK','ACTIVE');
INSERT INTO test VALUES(2,1,'RUN','ACTIVE');
INSERT INTO test VALUES(3,2,'WALK','INACTIVE');
INSERT INTO test VALUES(4,2,'RUN','ACTIVE');
INSERT INTO test VALUES(5,3,'WALK','UPDATING');
INSERT INTO test VALUES(6,3,'RUN','ACTIVE');
INSERT INTO test VALUES(7,4,'WALK','UPDATING');
INSERT INTO test VALUES(8,4,'RUN','INACTIVE');
INSERT INTO test VALUES(9,2,'RUN','UPDATING');
Query #1
SELECT * FROM test;
id | user_id | product | status |
---|---|---|---|
1 | 1 | WALK | ACTIVE |
2 | 1 | RUN | ACTIVE |
3 | 2 | WALK | INACTIVE |
4 | 2 | RUN | ACTIVE |
5 | 3 | WALK | UPDATING |
6 | 3 | RUN | ACTIVE |
7 | 4 | WALK | UPDATING |
8 | 4 | RUN | INACTIVE |
9 | 2 | RUN | UPDATING |
Basically, i need a query to get the latest of user_id status and the status only for active and updating. But, if active and updating is exists, they pick the active one. So from this table, the result should be
user_id | status |
---|---|
1 | ACTIVE |
2 | ACTIVE |
3 | ACTIVE |
4 | UPDATING |
i tried this, but it only return the latest value
SELECT distinct on (user_id) user_id, status
FROM test
ORDER BY user_id DESC
CodePudding user response:
Try:
With cte as (
select *,
row_number() over(partition by user_id order by id desc) row_num
from test
)
select user_id,status
from cte
where row_num=1;
Note that above query takes in consideration that max(id) is the latest value for each user_id
Another solution, use subquery to get the max(id) for each user:
select user_id,status
from test
where id in (select max(id)
from test
group by user_id
);
CodePudding user response:
using distinct on
SELECT distinct on (user_id) user_id, status
FROM test where status in ('ACTIVE','UPDATING')
ORDER BY user_id asc,id DESC
CodePudding user response:
We can user ROW_NUMBER() in a CTE with ORDER BY id DESC so that number 1 is the latest row.
with cte as
(select
user_id,
status,
row_number() over (partition by user_id order by id desc) rn
from test
where status in ('ACTIVE','UPDATING')
)
select * from cte
where rn = 1
order by user_id;
user_id | status | rn |
---|---|---|
1 | ACTIVE | 1 |
2 | ACTIVE | 1 |
3 | ACTIVE | 1 |
4 | UPDATING | 1 |
CodePudding user response:
Another option:
SELECT distinct on (user_id) user_id, status
FROM test
where status != 'INACTIVE'
ORDER BY user_id, array_position('{ACTIVE,UPDATING}', status)
When you do distinct on
the order by
is mandatory for the distinct field(s) (user_id
in this case), but you also need to specify the sort criteria after that to determine which distinct record to show.
In other words, you want to only show one record by user id, but HOW do you determine which one? That's why you need the second argument in the order by.