I'm trying to create a simple user retention, I tried a lot of ways searching, but none of them seems efficient and most of them have a lot of calculations that I don't really need.
Input my table example: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ec68bebf63280023e828ebb6cabd2d89 OR:
DROP TABLE IF EXISTS users_churn;
create table
users_churn(
id SERIAL PRIMARY KEY,
users varchar(10) not null,
dates timestamp
);
insert into users_churn(users, dates)
values
('1ab7', '2022-01-04 10:22'),
('1ab7', '2022-04-26 18:30'),
('1ab7', '2022-05-12 20:10'),
('1ab7', '2022-07-02 20:55'),
('3ac5', '2022-02-05 05:12'),
('3ac5', '2022-04-09 07:17'),
('3ac5', '2022-07-03 04:19');
select * from users_churn
Output that I expect:
-------------------------------------------------
| --DESIRED OUTPUT |
-------------------------------------------------
| ('1ab7', '2022-01-04 10:22', 'first_purchase'), |
| ('1ab7', '2022-02-01 00:00', 'churn'), |
| ('1ab7', '2022-03-01 00:00', 'churn'), |
| ('1ab7', '2022-04-26 18:30', 'retained'), |
| ('1ab7', '2022-05-12 20:10', 'retained'), |
| ('1ab7', '2022-06-01 00:00', 'churn'), |
| ('1ab7', '2022-07-02 20:55', 'retained'), |
| ('3ac5', '2022-02-05 05:12', 'first_purchase'), |
| ('3ac5', '2022-03-01 00:00', 'churn'), |
| ('3ac5', '2022-04-09 07:17', 'retained'), |
| ('3ac5', '2022-05-01 00:00', 'churn'), |
| ('3ac5', '2022-06-01 00:00', 'churn'), |
| ('3ac5', '2022-07-03 04:19', 'retained'); |
-------------------------------------------------
CodePudding user response:
It's not clear what you want done if there is more that one purchase by a user in a month. This version keeps only the first one. If you want to keep all purchases, then get rid of the rn_month
stuff:
with churn as (
select c.users,
c.dates,
date_trunc('month', c.dates) as mdate,
row_number() over (partition by c.users
order by c.dates) as rn_all,
row_number() over (partition by c.users, date_trunc('month', c.dates)
order by c.dates) as rn_month
from users_churn c
), months as (
select generate_series(
min(c.mdate),
current_date,
interval '1 month'
)::timestamp as mdate
from churn c
)
select c.users,
coalesce(c1.dates, m.mdate) as dates,
case
when c1.rn_all = 1 then 'first_purchase'
when c1.users is null then 'churn'
else 'retained'
end as status
from churn c
join months m
on m.mdate >= c.mdate
and c.rn_all = 1
left join churn c1
on c1.users = c.users
and c1.mdate = m.mdate
and c.rn_month = 1
order by c.users, m.mdate;
db<>fiddle here