I'm currently doing some work on a dataset that contains monthly data about the subscription status of users, where a given row shows whether a user is subscribed on a given month(shown as truncated date) in the following format:
------------ ------ ---------------------
| month | user | subscription_status |
------------ ------ ---------------------
| 2022-02-01 | 2345 | true |
| 2022-01-01 | 2345 | false |
| 2021-12-01 | 2345 | false |
| 2022-02-01 | 4578 | true |
| 2022-02-01 | 9548 | false |
| 2022-02-01 | 1212 | true |
|...... |
------------ ------ ---------------------
What I would like to do is query this data, but also generate a number of extra boolean fields whose values are conditional on the previous monthly subscription data for a given user, e.g:
never_subscribed_ever
(subscription_status is false for each month for a user)never_subscribed_last_6_months
(subscription_status is false for the last 6 months only)first_subscribed
(subscription_status is true for the first time)resubscribed
(subscription_status is true after being false last month)
to give a table such as :
------------ ------ --------------------- ----------------------- -------------------------------- ------------------ --------------
| month | user | subscription_status | never_subscribed_ever | never_subscribed_last_6_months | first_subscribed | resubscribed |
------------ ------ --------------------- ----------------------- -------------------------------- ------------------ --------------
However I'm afraid that I am quite lost on how best to implement the logic which I plan to do using the lag
function along with conditionals such as if
/case
when needed.
Can anyone help me get started on the best way to implement lag logic on a user level looking at various past timeframes per field?
CodePudding user response:
Actually this is a nice little homework of the usage of analytic functions.
Most of them are straightforward, only for the never_subscribed_last_6_months
you need to apply a windowing clause to get the last 6 months.
Query
select
month,
user_id,
subscription_status,
case when
count(case when SUBSCRIPTION_STATUS = 'true' then 1 end)
over (partition by user_id order by month) = 0 then 'Y' end as never_subscribed_ever,
case when
count(case when SUBSCRIPTION_STATUS = 'true' then 1 end)
over (partition by user_id order by month
range between INTERVAL '6' MONTH preceding and current row) = 0 then 'Y' end as never_subscribed_last_6_months,
min(case when SUBSCRIPTION_STATUS = 'true' then month end) over (partition by user_id) as first_subscribed,
case when SUBSCRIPTION_STATUS = 'true' and
lag(SUBSCRIPTION_STATUS) over (partition by user_id order by MONTH) = 'false' then 'Y' end as resubscribed
from user_status
order by user_id, month desc;
Sample Data
create table user_status as
select date'2022-02-01' month, 2345 user_id, 'true' subscription_status from dual union all
select date'2022-01-01' month, 2345 user_id, 'false' subscription_status from dual union all
select date'2021-12-01' month, 2345 user_id, 'false' subscription_status from dual union all
select date'2022-02-01' month, 4578 user_id, 'true' subscription_status from dual union all
select date'2022-02-01' month, 9548 user_id, 'false' subscription_status from dual union all
select date'2022-02-01' month, 1212 user_id, 'true' subscription_status from dual;