I am solving the following Hard Leetcode SQL Question.
Link to Question: https://leetcode.com/problems/trips-and-users/
Question:
------------- ----------
| Column Name | Type |
------------- ----------
| id | int |
| client_id | int |
| driver_id | int |
| city_id | int |
| status | enum |
| request_at | date |
------------- ----------
id is the primary key for this table.
The table holds all taxi trips. Each trip has a unique id, while client_id and driver_id are foreign keys to the users_id at the Users table.
Status is an ENUM type of ('completed', 'cancelled_by_driver', 'cancelled_by_client').
------------- ----------
| Column Name | Type |
------------- ----------
| users_id | int |
| banned | enum |
| role | enum |
------------- ----------
users_id is the primary key for this table.
The table holds all users. Each user has a unique users_id, and role is an ENUM type of ('client', 'driver', 'partner').
banned is an ENUM type of ('Yes', 'No').
The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.
Write a SQL query to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between "2013-10-01" and "2013-10-03". Round Cancellation Rate to two decimal points.
Return the result table in any order.
The query result format is in the following example.
Trips table:
---- ----------- ----------- --------- --------------------- ------------
| id | client_id | driver_id | city_id | status | request_at |
---- ----------- ----------- --------- --------------------- ------------
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
---- ----------- ----------- --------- --------------------- ------------
Users table:
---------- -------- --------
| users_id | banned | role |
---------- -------- --------
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
---------- -------- --------
Output:
------------ -------------------
| Day | Cancellation Rate |
------------ -------------------
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
------------ -------------------
Here's my code:
WITH Requests_Cancelled AS (
SELECT Trips.client_id as ID, Trips.request_at as Day, COUNT(*) as cancelled_count
FROM Trips
INNER JOIN Users ON
Trips.client_id = Users.users_id
WHERE Users.banned = "No" AND Users.role = "client" AND Trips.status = "cancelled_by_client"
GROUP BY Trips.request_at
UNION
SELECT Trips.driver_id as ID, Trips.request_at as Day, COUNT(*) as cancelled_count
FROM Trips
INNER JOIN Users ON
Trips.driver_id = Users.users_id
WHERE Users.banned = "No" AND Users.role = "driver" AND Trips.status = "cancelled_by_driver"
GROUP BY Trips.request_at
),
Requests_Total AS (
SELECT Trips.client_id as ID, Trips.request_at as Day, COUNT(*) as total_count
FROM Trips
INNER JOIN Users ON
Trips.client_id = Users.users_id
WHERE Users.banned = "No" AND Users.role = "client"
GROUP BY Trips.request_at
UNION
SELECT Trips.driver_id as ID, Trips.request_at as Day, COUNT(*) as total_count
FROM Trips
INNER JOIN Users ON
Trips.driver_Id = Users.users_id
WHERE Users.banned = "No" AND Users.role = "driver"
GROUP BY Trips.request_at
)
SELECT Requests_Total.Day, IFNULL(MAX(ROUND(Requests_Cancelled.cancelled_count/Requests_Total.total_count, 2)), 0) as 'Cancellation Rate'
FROM Requests_Cancelled
RIGHT JOIN Requests_Total ON
Requests_Cancelled.Day = Requests_Total.Day
GROUP BY Requests_Total.Day
ORDER BY Requests_Total.Day ASC;
The code passes the first testcase below:
Input: {"headers": {"Trips": ["id", "client_id", "driver_id", "city_id", "status", "request_at"], "Users": ["users_id", "banned", "role"]}, "rows": {"Trips": [["1", "1", "10", "1", "completed", "2013-10-01"], ["2", "2", "11", "1", "cancelled_by_driver", "2013-10-01"], ["3", "3", "12", "6", "completed", "2013-10-01"], ["4", "4", "13", "6", "cancelled_by_client", "2013-10-01"], ["5", "1", "10", "1", "completed", "2013-10-02"], ["6", "2", "11", "6", "completed", "2013-10-02"], ["7", "3", "12", "6", "completed", "2013-10-02"], ["8", "2", "12", "12", "completed", "2013-10-03"], ["9", "3", "10", "12", "completed", "2013-10-03"], ["10", "4", "13", "12", "cancelled_by_driver", "2013-10-03"]], "Users": [["1", "No", "client"], ["2", "Yes", "client"], ["3", "No", "client"], ["4", "No", "client"], ["10", "No", "driver"], ["11", "No", "driver"], ["12", "No", "driver"], ["13", "No", "driver"]]}}
Output: {"headers": ["Day", "Cancellation Rate"], "values": [["2013-10-01", 0.33], ["2013-10-02", 0.00], ["2013-10-03", 0.50]]}
Expected: {"headers": ["Day", "Cancellation Rate"], "values": [["2013-10-01", 0.33], ["2013-10-02", 0.00], ["2013-10-03", 0.50]]}
But does not pass the second testcase:
Input: {"headers": {"Trips": ["id", "client_id", "driver_id", "city_id", "status", "request_at"], "Users": ["users_id", "banned", "role"]}, "rows": {"Trips": [["1", "1", "10", "1", "cancelled_by_client", "2013-10-04"]], "Users": [["1", "No", "client"], ["10", "No", "driver"]]}}
Output: {"headers": ["Day", "Cancellation Rate"], "values": [["2013-10-04", 1.00]]}
Expected: {"headers":["Day","Cancellation Rate"],"values":[]}
I don't understand why NULL values are expected in the second testcase.
CodePudding user response:
Ah, I see what you are doing. Use the same lines of thinking but switch to using request_date instead of client id.
Get your cancellations
select request_at, count(*) as cancels, 0 as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.status in ('cancelled_by_driver', 'cancelled_by_client')
and t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
Get your ride requests
select request_at, 0 as cancels, count(*) as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
Union them together
select request_at, count(*) as cancels, 0 as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.status in ('cancelled_by_driver', 'cancelled_by_client')
and t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
union all
select request_at, 0 as cancels, count(*) as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
Now, for each day, you have one line with cancelation count and zero request count. For each day, you have one line with zero cancelation count and valid request count.
Final result
select request_at as "Day",
round(coalesce(sum(cancels), 0)/coalesce(sum(requests), 0)/1.0, 2) as "Cancellation Rate"
from
(
select request_at, count(*) as cancels, 0 as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.status in ('cancelled_by_driver', 'cancelled_by_client')
and t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
union all
select request_at, 0 as cancels, count(*) as requests
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
) main
group by request_at
That'll give you what you want.
This is likely to be faster since you are getting cancelations per day in one query and requests per day in another one. For your final result, you don't have to do any more joins.
Example
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=2b5dc7262d5c7839ddb39ab02e365821
Compact version based on this
select request_at as "Day",
round(
coalesce(count( if(t.status != 'completed', 1.0, null) ), 0.0)
/
coalesce(count(*), 0.0)
, 2) as "Cancellation Rate"
from trips t
join users uc on t.client_id = uc.users_id and 'No' = uc.banned and 'client' = uc.role
join users ud on t.driver_id = ud.users_id and 'No' = ud.banned and 'driver' = ud.role
where t.request_at between '2013-10-01' and '2013-10-03'
group by request_at
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8a223073878fab69b3859f7853609844
Why does your code fail a test?
- "Trips": [["1", "1", "10", "1", "cancelled_by_client", "2013-10-04"]]
- "Users": [["1", "No", "client"], ["10", "No", "driver"]]
This test fails because your code does not have and request_at between '2013-10-01' and '2013-10-03'
in the where clause. Here's an example you can review: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=58f638d2fccd9a2c4429c345c9dc43cb
Even with the correction to your code as above, another case will fail. Why?
- {"Trips": [["1111", "1", "10", "1", "completed", "2013-10-01"]],
- "Users": [["1", "Yes", "client"], ["10", "No", "driver"]]
This case will fail next because your Requests_Total
takes trips from unbanned client (ignoring the fact that the driver on that trip could be banned) and UNION
s it with unbanned driver (ignoring the fact that the client on that trip could be banned. You shouldn't union them.
SELECT trips.request_at as Day, COUNT(*) as total_count
FROM trips
INNER JOIN users ON
trips.client_id = users.users_id
WHERE users.banned = "No" AND users.role = "client"
and request_at between '2013-10-01' and '2013-10-03'
GROUP BY trips.request_at
UNION
SELECT trips.request_at as Day, COUNT(*) as total_count
FROM trips
INNER JOIN users ON
trips.driver_Id = users.users_id
WHERE users.banned = "No" AND users.role = "driver"
and request_at between '2013-10-01' and '2013-10-03'
GROUP BY trips.request_at
The fix is to ensure that for each trip you check that both driver and client are unbanned. That's what I did in my code.
Here're the results from your code that you can tweak at your leisure. https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=57e6d83db05a5cb4f99e715ccb133032