Home > front end >  For MySQL, I need a report of rows more than 1 standard deviation from average in percent
For MySQL, I need a report of rows more than 1 standard deviation from average in percent

Time:01-03

I'm trying to write a MySQL query to calculate the rows that are much different than the most common for each product and report percent difference where under 100% is lower priced than average for that product and over 100% is higher priced. Ignoring prices that are less than 1 standard deviation from average.

Sample data:

_rowid _timestamp code fk_product_id fk_po_id cost
5952 2021-01-10 10:19:01 00805 1367 543 0.850
9403 2022-05-23 14:54:34 00805 1367 2942 0.850
41595 2022-11-23 11:20:26 00805 1367 3391 1.350
39635 2022-01-18 12:49:32 Water1 344 3153 0.140
40134 2022-04-06 22:39:34 Water1 344 2747 0.190
41676 2022-12-09 16:28:28 Water1 344 3398 0.140
39634 2022-01-18 12:49:31 gr309203 344400 1024 3154 0.770
35634 2021-03-03 15:23:23 gr309203 344400 1024 3203 0.790
41264 2022-11-16 11:41:44 gr309203 344400 1024 3357 0.970
SELECT code, fk_product_id, cost, cost/
  (SELECT avg(cost) FROM po_line aa WHERE aa.code = code) AS percent 
FROM po_line 
WHERE (SELECT STDDEV(cost) FROM po_line ss WHERE ss.code = code)>1;

This doesn't return any rows, but there are three lines (one for each product) that should show in the report.

The expected result should be:

code fk_product_id cost percent
00805 1367 1.350 133
Water1 344 0.190 121
gr309203 344400 1024 0.970 115

CodePudding user response:

This query shows how to use window functions to calculate the number of standard deviations and the percentage of cost relative to the average cost for each given code.

SELECT code, cost, avg, std, 
  ABS(cost-avg)/std AS num_std, 
  cost*100/avg AS pct
FROM (
  SELECT code, cost,
    AVG(cost) OVER (PARTITION BY code) AS avg,
    STDDEV(cost) OVER (PARTITION BY code) AS std
  FROM po_line
) AS p;

Result:

 ----------------- ------- ----------- --------------------- -------------------- ------------- 
| code            | cost  | avg       | std                 | num_std            | pct         |
 ----------------- ------- ----------- --------------------- -------------------- ------------- 
| 00805           | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 |  83.6065546 |
| 00805           | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 |  83.6065546 |
| 00805           | 1.350 | 1.0166667 | 0.23570226039551592 | 1.4142134209517383 | 132.7868809 |
| gr309203 344400 | 0.770 | 0.8433333 |  0.0899382504215469 | 0.8153738777025533 |  91.3043514 |
| gr309203 344400 | 0.790 | 0.8433333 |  0.0899382504215469 | 0.5929990827042229 |  93.6758930 |
| gr309203 344400 | 0.970 | 0.8433333 |  0.0899382504215469 | 1.4083740722807512 | 115.0197674 |
| Water1          | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 |  89.3616831 |
| Water1          | 0.190 | 0.1566667 | 0.02357022603955158 | 1.4142121481595331 | 121.2765699 |
| Water1          | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 |  89.3616831 |
 ----------------- ------- ----------- --------------------- -------------------- ------------- 

(Note that window functions require MySQL 8.0).

That query just shows you how the calculations are done. To get the result you want:

SELECT code, fk_product_id, cost, ROUND(cost*100/avg) AS pct
FROM (
  SELECT code, cost, fk_product_id,
    AVG(cost) OVER (PARTITION BY code) AS avg,
    STDDEV(cost) OVER (PARTITION BY code) AS std
  FROM po_line
) AS p
WHERE ABS(cost-avg)/std > 1;

 ----------------- --------------- ------- ------ 
| code            | fk_product_id | cost  | pct  |
 ----------------- --------------- ------- ------ 
| 00805           |          1367 | 1.350 |  133 |
| gr309203 344400 |          1024 | 0.970 |  115 |
| Water1          |           344 | 0.190 |  121 |
 ----------------- --------------- ------- ------ 
  • Related