Given the following two tables
users
user_id name join_date
1 Jon 2020-02-14
2 Jane 2020-02-14
3 Jill 2020-02-15
4 Josh 2020-02-15
5 Jean 2020-02-16
6 Justin 2020-02-17
7 Jeremy 2020-02-18
events
user_id type access_date
1 F1 2020-03-01
2 F2 2020-03-02
2 P 2020-03-12
3 F2 2020-03-15
4 F2 2020-03-15
1 P 2020-03-16
3 P 2020-03-22
return the fraction of users, rounded to two decimal places, who accessed feature two (type: F2 in events table) and upgraded to premium within the first 30 days of signing up. It should give upgrade_rate 0.33.
Here is my attempt
;WITH users AS (
SELECT * FROM (
VALUES
(1, 'Jon', CAST('14-02-20' AS date)),
(2, 'Jane', CAST('14-02-20' AS date)),
(3, 'Jill', CAST('15-02-20' AS date)),
(4, 'Josh', CAST('15-02-20' AS date)),
(5, 'Jean', CAST('16-02-20' AS date)),
(6, 'Justin', CAST('17-02-20' AS date)),
(7, 'Jeremy', CAST('18-02-20' AS date))
) AS _ (user_id, name, join_date)
),
events AS (
SELECT * FROM (
VALUES
(1, 'F1', CAST('01-03-20' AS date)),
(2, 'F2', CAST('02-03-20' AS date)),
(2, 'P', CAST('12-03-20' AS date)),
(3, 'F2', CAST('15-03-20' AS date)),
(4, 'F2', CAST('15-03-20' AS date)),
(1, 'P', CAST('16-03-20' AS date)),
(3, 'P', CAST('22-03-20' AS date))
) AS _ (user_id, type, access_date)
),
feature_two_upg AS (
SELECT *
FROM events
WHERE type = 'F2'
),
premium_upg AS (
SELECT *
FROM events
WHERE type = 'P'
),
differ_date AS (
SELECT feature.user_id, premium.access_date
FROM feature_two_upg AS feature
INNER JOIN premium_upg AS premium
ON feature.user_id = premium.user_id
WHERE DATEDIFF(DAY, feature.access_date, premium.access_date) < 30
)
SELECT ROUND(AVG(CAST(CASE WHEN differ_date.user_id IS NOT NULL THEN 1.0 ELSE 0.0 END AS float)), 2) AS upgrade_rate
FROM users
LEFT JOIN differ_date
ON users.user_id = differ_date.user_id
Right now it's giving me 0.29 upgrade_rate and I'm wondering why
CodePudding user response:
So, I'm checking your query, let's show the result from table without the function
SELECT users.user_id, differ_date.user_id
FROM users
LEFT JOIN differ_date
ON users.user_id = differ_date.user_id
The result is this
users.user_id differ_date.user_id
1 NULL
2 2
3 3
4 NULL
5 NULL
6 NULL
7 NULL
Now checking your formula, each time the differ_date.user_id is not null you sum 1, the function avg count the 7 register then, is 2 / 7 = 0.28571428, round by 2 = 0.29
Best Regards
CodePudding user response:
When I run this, I get 0.00, UNLESS I comment out WHERE DATEDIFF(DAY, feature.access_date, premium.access_date) < 30 in differ_date, Then I get .29.