I have asked a similar question and have received some help from some very nice people. How to find the average of all other products in postgresql. This question is not all but I thought I can work out the rest on my own if the hardest part can be resolved but apparently I've overestimated my abilities. So I'm posting another question... :)
The question is as followed.
I have a table Products
which looks like the following:
----------- ----------- ----------
|ProductCode|ProductType| .... |
----------- ----------- ----------
| ref01 | BOOKS | .... |
| ref02 | ALBUMS | .... |
| ref06 | BOOKS | .... |
| ref04 | BOOKS | .... |
| ref07 | ALBUMS | .... |
| ref10 | TOYS | .... |
| ref13 | TOYS | .... |
| ref09 | ALBUMS | .... |
| ref29 | TOYS | .... |
| ref02 | ALBUMS | .... |
| ..... | ..... | .... |
----------- ----------- ----------
Another table Sales
which looks like the following:
----------- ----------- ----------
|ProductCode| qty | .... |
----------- ----------- ----------
| ref01 | 15 | .... |
| ref02 | 12 | .... |
| ref06 | 20 | .... |
| ref04 | 14 | .... |
| ref07 | 11 | .... |
| ref10 | 19 | .... |
| ref13 | 3 | .... |
| ref09 | 9 | .... |
| ref29 | 5 | .... |
| ref02 | 4 | .... |
| ..... | ..... | .... |
----------- ----------- ----------
I am trying to find the products that were ordered 20% more than the average of all other products of the same type.
A product can be ordered several times and the quantities (qty) of each order might not be the same. Such as ref02
in the sample table. I only included one example (ref02) but it is the case for all products. So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.
By manually calculating, the result should be something like:
----------- ----------- ----------
|ProductCode| qty | .... |
----------- ----------- ----------
| ref02 | 16 | .... |
| ref06 | 20 | .... |
| ref07 | 11 | .... |
| ref10 | 19 | .... |
| ..... | ..... | .... |
----------- ----------- ----------
So if looking in the type ALBUMS
and product ref02, then I need to find the average of Orders of ALL OTHER ALBUMS
.
In this case, it is the average of ref06
and ref04
, but there are more in the actual table. So what I need to do is the following:
Since product ref02 is 'ALBUMS' and there are two orders of ref02, the total orders will be 12 4=16. And ref07 and ref09 are also 'ALBUMS'.
So their average is (11 9)/2=10 < 12 4=16.
Since product ref06 is 'BOOKS', and **ref01** and ref04 are also 'BOOKS'.
So their average is (15 14)/2=14.5 <20.
Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'.
So their average is (12 9 4)/3=8.3 <11.
Since product ref10 is 'TOYS', and ref13 and ref29 are also 'TOYS'
So their average is (3 5)/2=4<19.
The rest does not satisfy the condition thus will not be in the result.
I know how to and was able to find the average of orders for all products under the same type, but I have no idea how to find the average of orders for all other products under the same type.
I know how to find the desired products with the helps I've received from my previous question How to find the average of all other products in postgresql, but that is when there is only one order for each product. I don't know how to proceed if there are multiple orders for each product. This is the "overestimated" bit I've mentioned at the beginning... :(
The answers I've received in my previous question has this problem: DEMO (db<>fiddle). The tables in the demo are much more similar to the ones I'm working with, and as you see, there are many rows for one product. (The duplicated rows are by accident. The values just happened to be the same)
I am using PostgreSQL, but the exercise forbids the use of several keywords including: WITH
, OVER
, LIMIT
, PARTITION
, or LATERAL
. I realize that they are commonly used in most solutions I've found and the ones provided to me, but I cannot use them because no result will be returned otherwise... :(
I know not being allowed to use these keywords can be annoying, but I honestly don't know what to do so please help! :)
CodePudding user response:
I wrote a query for all combinations, Total by Product Code, Total by Product Type and e.t.c. You can calculate the average value if you need using (SUM values / Count Values).
select
main1.product_code,
main1.product_type,
main1.total as "Total by Product Code",
main1.sales_count as "Count by Product Code",
main2.total as "Total by Product Type",
main2.sales_count as "Count by Product Type",
main2.total - main1.total as "Total by Other Products Types (ignore this Product Code)",
main2.sales_count - main1.sales_count as "Count by Other Products Types (ignore this Product Code)"
from
(
select
s.product_code,
p.product_type,
sum(s.qty) as total,
count(*) as sales_count
from
examples.sales s
left join
examples.products p on p.product_code = s.product_code
group by
s.product_code, p.product_type
) main1
left join
(
select t1.product_type, sum(t1.qty) as total, count(*) as sales_count from (
select * from examples.sales s
left join examples.products p on p.product_code = s.product_code
) t1
group by t1.product_type
) main2 on main1.product_type = main2.product_type
Result:
Pr.Code | Pr.Type | Total by Pr.Code | Count by Pr.Code | Total by Pr.Type | Count by Pr.Type (ignore this Product Code) | Total by Other Pr.Types | Count by Other Pr.Types (ignore this Product Code) |
---|---|---|---|---|---|---|---|
ref29 | TOYS | 5 | 1 | 27 | 3 | 22 | 2 |
ref06 | BOOKS | 20 | 1 | 34 | 2 | 14 | 1 |
ref13 | TOYS | 3 | 1 | 27 | 3 | 24 | 2 |
ref02 | ALBUMS | 16 | 2 | 36 | 4 | 20 | 2 |
ref10 | TOYS | 19 | 1 | 27 | 3 | 8 | 2 |
ref07 | ALBUMS | 11 | 1 | 36 | 4 | 25 | 3 |
ref04 | BOOKS | 14 | 1 | 34 | 2 | 20 | 1 |
ref09 | ALBUMS | 9 | 1 | 36 | 4 | 27 | 3 |
CodePudding user response:
Fix two errors in the setup
1.
A product can be ordered several times ...
It should still appear once in the Products
table. The 2nd entry of ref02
is wrong.
2.
So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.
So your rationale for ref07
doesn't hold:
Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'. So their average is (12 9 4)/3=8.3 <11.
Counting the two sales for ref02
separately is wrong in light of your definition. Operate with sums per product:
Since product ref07 is 'ALBUMS', and ref02 and ref09 are also 'ALBUMS'.
So their average is (16 9)/2 = 12.5 > 11. -- doesn't qualify!
Answer
find the products that were ordered 20% more than the average of all other products of the same type.
I am putting a proper solution first: an efficient query for Postgres 11 using a window function with custom window frame over the aggregate sum()
SELECT product_code, orders
FROM (
SELECT product_code, sum(s.orders) AS orders
, avg(sum(s.orders)) OVER (PARTITION BY p.product_type
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW) AS avg_orders
FROM product p
JOIN sales s USING (product_code)
GROUP BY product_code, p.product_type
) sub
WHERE avg_orders * 1.2 < orders
ORDER BY product_code; -- optional
Result (with the errors mentioned above fixed):
product_code | orders |
---|---|
ref02 | 16 |
ref06 | 20 |
ref10 | 19 |
Much more efficient than the below.
Postgres can apply a window function over an aggregate in the same query level. See:
- Postgres window function and group by exception
- How to use a SQL window function to calculate a percentage of an aggregate
At your request, an inefficient solution working around modern SQL features:
SELECT product_code, ps.orders
FROM (
SELECT product_code, p.product_type, sum(s.orders) AS orders
FROM product p
JOIN sales s USING (product_code)
GROUP BY product_code, p.product_type
) ps
JOIN LATERAL (
SELECT avg(orders) AS avg_orders
FROM (
SELECT sum(s1.orders) AS orders
FROM product p1
JOIN sales s1 USING (product_code)
WHERE p1.product_type = ps.product_type
AND p1.product_code <> ps.product_code
GROUP BY product_code
) sub
) a ON a.avg_orders * 1.2 < ps.orders
ORDER BY product_code; -- optional
db<>fiddle here
Same result.
We have to repeat the basic aggregation for sums in the subquery, since we cannot use a CTE to materialize it. (Possible remaining workaround: use a temporary table isntead.)
Basics in my answer to your previous question: