Every day I create a table that looks like this:
user_id | received_at | age_pref | ethnicity_pref |
---|---|---|---|
1 | 10:01 | 18-28 | open_to_all |
2 | 10:05 | 18-23 | open_to_all |
1 | 10:08 | 18-30 | open_to_all |
2 | 10:07 | 18-25 | Hispanic/Iatino |
3 | 10:09 | 56-33 | White |
It's a table that lists the actions a user takes from 10am-11am. As you can see, there are 3 distinct user IDs.
Using this, I'm trying to create another table using lag
to see if the previous value changed or not. However, the problem is that the first row is inaccurate because there's no way for me to measure if an attribute changed without the previous row before this set of data (maybe it occurred at 930am). How do I get the previous received_at
row for each user ID in this table, but only 1 for each user_id? I want the new table to look like this, where the new records are prepended at the beginning
user_id | received_at | age_pref | ethnicity_pref |
---|---|---|---|
1 | 9:48 | 20-30 | asian |
2 | 9:52 | 30-32 | white |
3 | 9:58 | 28-30 | open_to_all |
1 | 10:01 | 18-28 | open_to_all |
2 | 10:05 | 18-23 | open_to_all |
1 | 10:08 | 18-30 | open_to_all |
2 | 10:07 | 18-25 | Hispanic/Iatino |
3 | 10:09 | 56-33 | White |
note there are several rows that exist before this time interval. I want the most recent one prepended to the table for the user_ids that exist in the table.
basically I want to include 1 more row for EACH user_id before the time window so that my table that tracks changes is accurate since lag will always have the first row be null.
CodePudding user response:
I guess you can union all
the following query:
select distinct on (user_id) user_id, received_at, age_pref, ethnicity_pref
from t
order by user_id, received_at desc
where received_at < '10:00'