Home > Net >  Creating conditional fields based upon previous month values
Creating conditional fields based upon previous month values

Time:02-25

I'm currently doing some work on a dataset that contains monthly data about the subscription status of users, where a given row shows whether a user is subscribed on a given month(shown as truncated date) in the following format:

 ------------ ------ --------------------- 
|   month    | user | subscription_status |
 ------------ ------ --------------------- 
| 2022-02-01 | 2345 | true                |
| 2022-01-01 | 2345 | false               |
| 2021-12-01 | 2345 | false               |
| 2022-02-01 | 4578 | true                |
| 2022-02-01 | 9548 | false               |
| 2022-02-01 | 1212 | true                |
|......                                   |
 ------------ ------ --------------------- 

What I would like to do is query this data, but also generate a number of extra boolean fields whose values are conditional on the previous monthly subscription data for a given user, e.g:

  • never_subscribed_ever(subscription_status is false for each month for a user)
  • never_subscribed_last_6_months (subscription_status is false for the last 6 months only)
  • first_subscribed (subscription_status is true for the first time)
  • resubscribed (subscription_status is true after being false last month)

to give a table such as :


 ------------ ------ --------------------- ----------------------- -------------------------------- ------------------ -------------- 
|   month    | user | subscription_status | never_subscribed_ever | never_subscribed_last_6_months | first_subscribed | resubscribed |
 ------------ ------ --------------------- ----------------------- -------------------------------- ------------------ -------------- 

However I'm afraid that I am quite lost on how best to implement the logic which I plan to do using the lag function along with conditionals such as if/case when needed.

Can anyone help me get started on the best way to implement lag logic on a user level looking at various past timeframes per field?

CodePudding user response:

Actually this is a nice little homework of the usage of analytic functions.

Most of them are straightforward, only for the never_subscribed_last_6_months you need to apply a windowing clause to get the last 6 months.

Query

select 
month,
user_id,
subscription_status,
case when 
count(case when  SUBSCRIPTION_STATUS = 'true' then 1 end) 
   over (partition by user_id order by month) = 0 then 'Y' end as never_subscribed_ever,
case when
count(case when  SUBSCRIPTION_STATUS = 'true' then 1 end) 
   over (partition by user_id order by month 
   range between INTERVAL '6' MONTH preceding and current row) = 0 then 'Y' end as  never_subscribed_last_6_months,
min(case when  SUBSCRIPTION_STATUS = 'true' then month end) over (partition by user_id) as first_subscribed,
case when  SUBSCRIPTION_STATUS = 'true' and
 lag(SUBSCRIPTION_STATUS) over (partition by user_id order by MONTH) = 'false' then 'Y' end as resubscribed
from user_status
order by user_id, month desc;

Sample Data

create table user_status as 
select date'2022-02-01' month, 2345 user_id, 'true' subscription_status from dual union all
select date'2022-01-01' month, 2345 user_id, 'false' subscription_status from dual union all
select date'2021-12-01' month, 2345 user_id, 'false' subscription_status from dual union all
select date'2022-02-01' month, 4578 user_id, 'true' subscription_status from dual union all
select date'2022-02-01' month, 9548 user_id, 'false' subscription_status from dual union all
select date'2022-02-01' month, 1212 user_id, 'true' subscription_status from dual;
  • Related