I'm trying to write a MySQL query to calculate the rows that are much different than the most common for each product and report percent difference where under 100% is lower priced than average for that product and over 100% is higher priced. Ignoring prices that are less than 1 standard deviation from average.
Sample data:
_rowid | _timestamp | code | fk_product_id | fk_po_id | cost |
---|---|---|---|---|---|
5952 | 2021-01-10 10:19:01 | 00805 | 1367 | 543 | 0.850 |
9403 | 2022-05-23 14:54:34 | 00805 | 1367 | 2942 | 0.850 |
41595 | 2022-11-23 11:20:26 | 00805 | 1367 | 3391 | 1.350 |
39635 | 2022-01-18 12:49:32 | Water1 | 344 | 3153 | 0.140 |
40134 | 2022-04-06 22:39:34 | Water1 | 344 | 2747 | 0.190 |
41676 | 2022-12-09 16:28:28 | Water1 | 344 | 3398 | 0.140 |
39634 | 2022-01-18 12:49:31 | gr309203 344400 | 1024 | 3154 | 0.770 |
35634 | 2021-03-03 15:23:23 | gr309203 344400 | 1024 | 3203 | 0.790 |
41264 | 2022-11-16 11:41:44 | gr309203 344400 | 1024 | 3357 | 0.970 |
SELECT code, fk_product_id, cost, cost/
(SELECT avg(cost) FROM po_line aa WHERE aa.code = code) AS percent
FROM po_line
WHERE (SELECT STDDEV(cost) FROM po_line ss WHERE ss.code = code)>1;
This doesn't return any rows, but there are three lines (one for each product) that should show in the report.
The expected result should be:
code | fk_product_id | cost | percent |
---|---|---|---|
00805 | 1367 | 1.350 | 133 |
Water1 | 344 | 0.190 | 121 |
gr309203 344400 | 1024 | 0.970 | 115 |
CodePudding user response:
This query shows how to use window functions to calculate the number of standard deviations and the percentage of cost relative to the average cost for each given code.
SELECT code, cost, avg, std,
ABS(cost-avg)/std AS num_std,
cost*100/avg AS pct
FROM (
SELECT code, cost,
AVG(cost) OVER (PARTITION BY code) AS avg,
STDDEV(cost) OVER (PARTITION BY code) AS std
FROM po_line
) AS p;
Result:
----------------- ------- ----------- --------------------- -------------------- -------------
| code | cost | avg | std | num_std | pct |
----------------- ------- ----------- --------------------- -------------------- -------------
| 00805 | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 | 83.6065546 |
| 00805 | 0.850 | 1.0166667 | 0.23570226039551592 | 0.7071069226079035 | 83.6065546 |
| 00805 | 1.350 | 1.0166667 | 0.23570226039551592 | 1.4142134209517383 | 132.7868809 |
| gr309203 344400 | 0.770 | 0.8433333 | 0.0899382504215469 | 0.8153738777025533 | 91.3043514 |
| gr309203 344400 | 0.790 | 0.8433333 | 0.0899382504215469 | 0.5929990827042229 | 93.6758930 |
| gr309203 344400 | 0.970 | 0.8433333 | 0.0899382504215469 | 1.4083740722807512 | 115.0197674 |
| Water1 | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 | 89.3616831 |
| Water1 | 0.190 | 0.1566667 | 0.02357022603955158 | 1.4142121481595331 | 121.2765699 |
| Water1 | 0.140 | 0.1566667 | 0.02357022603955158 | 0.7071081954001099 | 89.3616831 |
----------------- ------- ----------- --------------------- -------------------- -------------
(Note that window functions require MySQL 8.0).
That query just shows you how the calculations are done. To get the result you want:
SELECT code, fk_product_id, cost, ROUND(cost*100/avg) AS pct
FROM (
SELECT code, cost, fk_product_id,
AVG(cost) OVER (PARTITION BY code) AS avg,
STDDEV(cost) OVER (PARTITION BY code) AS std
FROM po_line
) AS p
WHERE ABS(cost-avg)/std > 1;
----------------- --------------- ------- ------
| code | fk_product_id | cost | pct |
----------------- --------------- ------- ------
| 00805 | 1367 | 1.350 | 133 |
| gr309203 344400 | 1024 | 0.970 | 115 |
| Water1 | 344 | 0.190 | 121 |
----------------- --------------- ------- ------