Home > Back-end >  Finding products that were ordered 20% more times than the average of all other products in postgres
Finding products that were ordered 20% more times than the average of all other products in postgres

Time:11-12

I have asked a similar question and have received some help from some very nice people. How to find the average of all other products in postgresql. This question is not all but I thought I can work out the rest on my own if the hardest part can be resolved but apparently I've overestimated my abilities. So I'm posting another question... :)

The question is as followed.

I have a table Products which looks like the following:

 ----------- ----------- ---------- 
|ProductCode|ProductType|   ....   |
 ----------- ----------- ---------- 
|   ref01   |   BOOKS   |   ....   |
|   ref02   |   ALBUMS  |   ....   |
|   ref06   |   BOOKS   |   ....   |
|   ref04   |   BOOKS   |   ....   |
|   ref07   |   ALBUMS  |   ....   |
|   ref10   |   TOYS    |   ....   |
|   ref13   |   TOYS    |   ....   |
|   ref09   |   ALBUMS  |   ....   |
|   ref29   |   TOYS    |   ....   |
|   ref02   |   ALBUMS  |   ....   |
|   .....   |   .....   |   ....   |
 ----------- ----------- ---------- 

Another table Sales which looks like the following:

 ----------- ----------- ---------- 
|ProductCode|    qty    |   ....   |
 ----------- ----------- ---------- 
|   ref01   |     15    |   ....   |
|   ref02   |     12    |   ....   |
|   ref06   |     20    |   ....   |
|   ref04   |     14    |   ....   |
|   ref07   |     11    |   ....   |
|   ref10   |     19    |   ....   |
|   ref13   |      3    |   ....   |
|   ref09   |      9    |   ....   |
|   ref29   |      5    |   ....   |
|   ref02   |      4    |   ....   |
|   .....   |   .....   |   ....   |
 ----------- ----------- ---------- 

I am trying to find the products that were ordered 20% more than the average of all other products of the same type.

A product can be ordered several times and the quantities (qty) of each order might not be the same. Such as ref02 in the sample table. I only included one example (ref02) but it is the case for all products. So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.

By manually calculating, the result should be something like:

 ----------- ----------- ---------- 
|ProductCode|    qty    |   ....   |
 ----------- ----------- ---------- 
|   ref02   |     16    |   ....   |
|   ref06   |     20    |   ....   |
|   ref07   |     11    |   ....   |
|   ref10   |     19    |   ....   |
|   .....   |   .....   |   ....   |
 ----------- ----------- ---------- 

So if looking in the type ALBUMS and product ref02, then I need to find the average of Orders of ALL OTHER ALBUMS. In this case, it is the average of ref06 and ref04, but there are more in the actual table. So what I need to do is the following:

Since product ref02 is 'ALBUMS' and there are two orders of ref02, the total orders will be 12 4=16. And ref07 and ref09 are also 'ALBUMS'. 
      So their average is (11 9)/2=10 < 12 4=16.

Since product ref06 is 'BOOKS', and **ref01** and ref04 are also 'BOOKS'.                     
      So their average is (15 14)/2=14.5 <20.

Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'.           
      So their average is (12 9 4)/3=8.3 <11.

Since product ref10 is 'TOYS', and ref13 and ref29 are also 'TOYS'           
      So their average is (3 5)/2=4<19.

The rest does not satisfy the condition thus will not be in the result.

I know how to and was able to find the average of orders for all products under the same type, but I have no idea how to find the average of orders for all other products under the same type.

I know how to find the desired products with the helps I've received from my previous question How to find the average of all other products in postgresql, but that is when there is only one order for each product. I don't know how to proceed if there are multiple orders for each product. This is the "overestimated" bit I've mentioned at the beginning... :(

The answers I've received in my previous question has this problem: DEMO (db<>fiddle). The tables in the demo are much more similar to the ones I'm working with, and as you see, there are many rows for one product. (The duplicated rows are by accident. The values just happened to be the same)

I am using PostgreSQL, but the exercise forbids the use of several keywords including: WITH, OVER, LIMIT, PARTITION, or LATERAL. I realize that they are commonly used in most solutions I've found and the ones provided to me, but I cannot use them because no result will be returned otherwise... :(

I know not being allowed to use these keywords can be annoying, but I honestly don't know what to do so please help! :)

CodePudding user response:

I wrote a query for all combinations, Total by Product Code, Total by Product Type and e.t.c. You can calculate the average value if you need using (SUM values / Count Values).

select 
    main1.product_code, 
    main1.product_type, 
    main1.total as "Total by Product Code", 
    main1.sales_count as "Count by Product Code",
    main2.total as "Total by Product Type", 
    main2.sales_count as "Count by Product Type",
    main2.total - main1.total as "Total by Other Products Types (ignore this Product Code)", 
    main2.sales_count - main1.sales_count as "Count by Other Products Types (ignore this Product Code)"
from 
    (
        select 
            s.product_code, 
            p.product_type,  
            sum(s.qty) as total, 
            count(*) as sales_count
        from 
            examples.sales s
        left join 
            examples.products p on p.product_code = s.product_code 
        group by 
            s.product_code, p.product_type
    ) main1 
left join 
    (
        select t1.product_type, sum(t1.qty) as total, count(*) as sales_count from (
            select * from examples.sales s  
            left join examples.products p on p.product_code = s.product_code 
        ) t1 
        group by t1.product_type
    ) main2 on main1.product_type = main2.product_type 

Result:

Pr.Code Pr.Type Total by Pr.Code Count by Pr.Code Total by Pr.Type Count by Pr.Type (ignore this Product Code) Total by Other Pr.Types Count by Other Pr.Types (ignore this Product Code)
ref29 TOYS 5 1 27 3 22 2
ref06 BOOKS 20 1 34 2 14 1
ref13 TOYS 3 1 27 3 24 2
ref02 ALBUMS 16 2 36 4 20 2
ref10 TOYS 19 1 27 3 8 2
ref07 ALBUMS 11 1 36 4 25 3
ref04 BOOKS 14 1 34 2 20 1
ref09 ALBUMS 9 1 36 4 27 3

CodePudding user response:

Fix two errors in the setup

1.

A product can be ordered several times ...

It should still appear once in the Products table. The 2nd entry of ref02 is wrong.

2.

So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.

So your rationale for ref07 doesn't hold:

Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'.           
      So their average is (12 9 4)/3=8.3 <11.

Counting the two sales for ref02 separately is wrong in light of your definition. Operate with sums per product:

Since product ref07 is 'ALBUMS', and ref02 and ref09 are also 'ALBUMS'.           
      So their average is (16 9)/2 = 12.5 > 11.  -- doesn't qualify!

Answer

find the products that were ordered 20% more than the average of all other products of the same type.

I am putting a proper solution first: an efficient query for Postgres 11 using a window function with custom window frame over the aggregate sum()

SELECT product_code, orders
FROM  (
   SELECT product_code, sum(s.orders) AS orders
        , avg(sum(s.orders)) OVER (PARTITION BY p.product_type
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                                   EXCLUDE CURRENT ROW) AS avg_orders
   FROM   product p
   JOIN   sales s USING (product_code)
   GROUP  BY product_code, p.product_type
   ) sub
WHERE  avg_orders * 1.2 < orders
ORDER  BY product_code;  -- optional

Result (with the errors mentioned above fixed):

product_code orders
ref02 16
ref06 20
ref10 19

Much more efficient than the below.
Postgres can apply a window function over an aggregate in the same query level. See:

At your request, an inefficient solution working around modern SQL features:

SELECT product_code, ps.orders
FROM  (
   SELECT product_code, p.product_type, sum(s.orders) AS orders
   FROM   product p
   JOIN   sales   s USING (product_code)
   GROUP  BY product_code, p.product_type
   ) ps
JOIN   LATERAL (
   SELECT avg(orders) AS avg_orders
   FROM  (
      SELECT sum(s1.orders) AS orders
      FROM   product p1
      JOIN   sales   s1 USING (product_code)
      WHERE  p1.product_type =  ps.product_type
      AND    p1.product_code <> ps.product_code
      GROUP  BY product_code
      ) sub
   ) a ON a.avg_orders * 1.2 < ps.orders
ORDER  BY product_code;  -- optional

db<>fiddle here

Same result.

We have to repeat the basic aggregation for sums in the subquery, since we cannot use a CTE to materialize it. (Possible remaining workaround: use a temporary table isntead.)

Basics in my answer to your previous question:

  • Related