There are two tables:
Trips (order_id, user_id, order_dt, order_cost) - all taxi trips
Users (user_id, gender) - table of users, where the gender of the user (M / W) is marked
Required: Calculate the difference in the average check for M/W users over a certain period (using any method in SQL, Python, Excel, etc.) Explain why it may differ
My solution to the problem: To calculate the revenue, you need to multiply the average check at the end of the month by the number of customers who made a purchase from you. To calculate the average check, you need to divide the revenue for the day (or the period of interest to us) by the number of checks, the resulting amount is the average check for the day. It can differ only from the number of male and female users.
1)in SQL -- calculation of the average check
SELECT SUM(u.order_cost) / COUNT(u.order_id)
FROM taxi AS t
INNER JOIN users AS u ON t.user_id = u.user_id
GROUP BY u.gender
-- difference I think so
SELECT avg(order_cost) filter(where gender = 'M') - avg(order_cost) filter(where gender = 'W')
FROM taxi AS t
INNER JOIN users AS u ON t.user_id = u.user_id
But they tell me it's wrong. I don't understand where I have an error: in the syntax or should I use a subquery, window functions? I'm new to SQL, so I don't understand how to do it right here.
In Excel, this task is calculated in 2 minutes with pulling up the floor, the summary and the calculated field of the average check.
In Python, I would use the Pandas and Numpy libraries
CodePudding user response:
This
SELECT SUM(u.order_cost) / COUNT(u.order_id)
FROM taxi AS t
INNER JOIN users AS u ON t.user_id = u.user_id
GROUP BY u.gender
Will give two rows -- one for male and one for female.
This will give for male only
SELECT SUM(u.order_cost) / COUNT(u.order_id)
FROM taxi AS t
INNER JOIN users AS u ON t.user_id = u.user_id
WHERE u.gender = 'M'
Here is how I would do it, the simple way
SELECT SUM(AM.AVG_O) / COUNT(AM.USER_ID) AS M_AVG,
SUM(AW.AVG_O) / COUNT(AW.USER_ID) AS W_AVG,
SUM(AM.AVG_O) / COUNT(AM.USER_ID) - SUM(AW.AVG_O) / COUNT(AW.USER_ID) AS DIFF
FROM taxi AS t
JOIN (
SELECT USER_ID, SUM(u.order_cost) / COUNT(u.order_id) AS AVG_O
FROM users
WHERE u.gender = 'M'
GROUP BY USER_ID
) AS AM ON AM.USER_ID = T.USER_ID
JOIN (
SELECT USER_ID, SUM(u.order_cost) / COUNT(u.order_id) AS AVG_O
FROM users
WHERE u.gender = 'W'
GROUP BY USER_ID
) AS AW ON AW.USER_ID = T.USER_ID
You could be fancy and reduce this to only one sub-query but this is clearer to understand. (Use CASE in main select)
Looking at it again I realized it is possible to simplify since you don't care about each user in the outer query. A cross join feels bad, but there is only 1 row in each of these tables so the result is just one row:
SELECT AM.AVG_O AS M_AVG,
AW.AVG_O AS W_AVG,
AM.AVG_O - AW.AVG AS DIFF
FROM taxi AS t
CROSS JOIN (
SELECT SUM(u.order_cost) / COUNT(u.order_id) AS AVG_O
FROM users
WHERE u.gender = 'M'
)
CROSS JOIN (
SELECT SUM(u.order_cost) / COUNT(u.order_id) AS AVG_O
FROM users
WHERE u.gender = 'W'
)