I'm trying to create a query with fields:
- data - Date sequence
- user_id - User ID
- is_paid - Did the user pay this month
- number_of_not_paid - the number of months since the last payment, if the payment is in this month, then 0
select
date,
user_id,
is_paid,
(case when is_paid
then 0
else sum(case when is_paid then 0 else 1 end) over (partition by user_id order by date)
end)
from data
The result I get is:
date | user_id | is_paid | num |
---|---|---|---|
2020-01-01 | 1 | true | 0 |
2020-02-01 | 1 | false | 1 |
2020-03-01 | 1 | false | 2 |
2020-04-01 | 1 | true | 0 |
2020-05-01 | 1 | false | 3 |
2020-06-01 | 1 | true | 0 |
And the result I would like to get is:
date | user_id | is_paid | num |
---|---|---|---|
2020-01-01 | 1 | true | 0 |
2020-02-01 | 1 | false | 1 |
2020-03-01 | 1 | false | 2 |
2020-04-01 | 1 | true | 0 |
2020-05-01 | 1 | false | 1 |
2020-06-01 | 1 | true | 0 |
How I can fix my query for correct result?
CodePudding user response:
You want to reset the delinquency timer every time the user pays, so first mark each time the user pays (boolean
can be cast to int
to be summed):
with runs as (
select date, user_id, is_paid,
sum(is_paid::int) over (partition by user_id
order by date) as run_number
from my_table
)
With these runs marked, you can then sum the preceding false
values within the (user_id, run_number)
window:
select date, user_id, is_paid,
sum((not is_paid)::int) over (partition by user_id, run_number
order by date) as num
from runs;
date | user_id | is_paid | num
:--------- | ------: | :------ | --:
2020-01-01 | 1 | t | 0
2020-02-01 | 1 | f | 1
2020-03-01 | 1 | f | 2
2020-04-01 | 1 | t | 0
2020-05-01 | 1 | f | 1
2020-06-01 | 1 | t | 0
db<>fiddle here