I have a table listing subscription events. When there is a "NEW" event added to the table, it means either a new subscription from a brand new customer OR the renewal of a monthly subscription from an existing customer. I want to be able to be able to summarize the data by month and split it depending on whether that is a new customer or just a renewal.
I am looking for a formula that says "if the user_ID is unknown and the event is "NEW", then count 1
in the "new customer" column, otherwise 1
in the "renewal" column
SOURCE TABLE
User_id | Event | Date |
---|---|---|
2 | NEW | 26/9/2021 |
2 | NEW | 26/8/2021 |
1 | NEW | 15/8/2021 |
DESIRED OUTPUT
Sept 20: 1 renewal; 0 new subscriptions
Aug 20: 2 new subscriptions
CodePudding user response:
You may use a window function MIN
to determine the earliest subscription date for each user and compare that to determine whether they are a new user or not. You may then aggregate/sum on this to determine the number of new subscriptions or renewals per year and month.
SELECT
YEAR(`Date`) as `year`,
MONTH(`Date`) as `month`,
SUM(is_new=true) as `new subscriptions`,
SUM(is_new=false) as `renewals`
FROM (
SELECT
*,
`Date`=MIN(`Date`) OVER (PARTITION BY `User_id`) as is_new
FROM
events
WHERE
`Event`='NEW'
) e
GROUP BY
YEAR(`Date`),
MONTH(`Date`)
ORDER BY
YEAR(`Date`),
MONTH(`Date`);
year | month | new subscriptions | renewals |
---|---|---|---|
2021 | 8 | 2 | 0 |
2021 | 9 | 0 | 1 |
or if you are using a mysql version which does not support window functions you may perform a left join on a subquery that finds the earliest subscription date. Using the same logic, the we can determine and count the number of new and renewed subscriptions.
SELECT
YEAR(`Date`) as `year`,
MONTH(`Date`) as `month`,
SUM(new_sub.min_date IS NOT NULL) as `new subscriptions`,
SUM(new_sub.min_date IS NULL) as `renewals`
FROM
events e
LEFT JOIN (
SELECT
`User_id`,
MIN(`Date`) as min_date
FROM
events
WHERE
`Event`='NEW'
GROUP BY
`User_id`
) as new_sub ON e.`User_id`=new_sub.`User_id` AND
e.`Date`=new_sub.min_date
GROUP BY
YEAR(`Date`),
MONTH(`Date`)
ORDER BY
YEAR(`Date`),
MONTH(`Date`)
year | month | new subscriptions | renewals |
---|---|---|---|
2021 | 8 | 2 | 0 |
2021 | 9 | 0 | 1 |
View working demo on DB Fiddle
Let me know if this works for you.
CodePudding user response:
You can use ROW_NUMBER()
to identify if a row is the first one for each client.
For example you can do:
select
year(date) as y,
month(date) as m,
sum(case when rn = 1 then 1 else 0 end) as new_subscriptions,
sum(case when rn <> 1 then 1 else 0 end) as renewals
from (
select *, row_number() over(partition by user_id order by date) as rn
from t
where event = 'NEW'
) x
group by y, m
order by y, m
Result:
y m new_subscriptions renewals
----- -- ------------------ --------
2021 8 2 0
2021 9 0 1
See running example at DB Fiddle.