this is an example of data structure in my SQL table
In fact I have many users in my table and some of them have incorrect order of steps (user number 2 in the picture). How can I select all such users? The logic is to select all users that have date of sign_in earlier than date of registration? I suppose regular WHERE clause won't work here. Maybe there is a special function for such cases?
CodePudding user response:
I can see two approaches to solve the problem. For reference this is how I imagine the table might look like
create table users (
user_id int,
action text,
date decimal
);
- Use a self join. In this we're basically fetching the records with 'registration' action and adding a self join on matching user_id and 'sign_in' action. Because of the join, the data for each of the action is now available in the same row so this allows you to compare in the where clause
select u1.*
from users u1
join users u2 on u1.user_id = u2.user_id and u2.action = 'sign_in'
where u1.action = 'registration' and u2.date < u1.date;
- Use crosstab* function of postgres. This allows you to transpose rows into columns hence gives the ability to compare in the where clause. Personally I think this is more elegant and extensive in the sense that it'll allow you to make other comparisons as well if needed without adding another join. Looking at the cost using "explain", this comes out to be more efficient as well.
SELECT *
FROM crosstab(
'select user_id, action, date
from users
order by user_id, action'
) AS ct(user_id int, del_account decimal, registration decimal, sign_in decimal)
where sign_in < registration;
*Note: In order to use crosstab however you may need superuser access to the database to create the extension. You can do so by running the following query only once
CREATE EXTENSION IF NOT EXISTS tablefunc;
Hope this helps. Let me know in the comments if there's any confusion
CodePudding user response:
Your question is a bit vague yet the problem is generic enough.
First let's make your actions comparable and sortable in the right sequence, for example '1.registration', '2.sign_in', '3.del_account'
instead of 'registration', 'sign_in', 'del_account'
. Even better, use action codes, 2 for sign_in
, 1 for registration
etc.
Then you can detect misplaced actions and select the list of distinct user_id
-s who did them.
select distinct user_id from
(
select user_id,
action > lead(action) over (partition by user_id order by "date") as misplaced
from the_table
) as t
where misplaced;
This approach would work for ay number of action steps, not only 3.
CodePudding user response:
If you create a case statement for the action column you can get date of sign_in earlier than date of registration
https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=1e112d51825f5d3185e445d97d4e9c78
select * from (
select ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY date ) as udid,case when action='registration' then 1
when action='sign_in' then 2
when action='delete' then 3
ELSE 4
end as stsord,*
from duptuser
) as drt where stsord!=udid