Please consider this records:
Id Type Price
---------------------------
1 1 100
2 2 200
3 1 150
4 1 300
5 2 200
6 2 400
7 2 500
Based on this Article: Simple Anomaly Detection Using Plain SQL I want to calculate Z-Score based on every Type. I mean first I want to partition records based on Type
column and then in every partition I calculate Z-Score. I wrote this query but I don't know how can I combine this with Partition
function:
WITH series AS (
SELECT Price
FROM MyTable AS n
),
stats AS (
SELECT avg(Price) series_avg,
stddev(Price) as series_stddev
FROM series
),
zscores AS (
SELECT Id, Type, Price, (price - series_avg) / series_stddev AS zscore
FROM series, stats
)
SELECT *,case when zscore NOT BETWEEN -1 AND 1 then 'T' else 'F' end AS is_anomaly
FROM zscores;
Thanks
CodePudding user response:
When you calculate the AVG & STDEV simply group by the Type.
Then join the data to the summary on the Type.
WITH CTE_DATA AS ( SELECT Id, [Type], Price FROM YourTable ) , CTE_STATS AS ( SELECT [Type] , AVG(Price) AS series_avg , STDEV(Price) AS series_stddev FROM CTE_DATA GROUP BY [Type] ) , CTE_ZSCORES AS ( SELECT d.Id, d.[Type], d.Price , (d.price - st.series_avg) / st.series_stddev AS zscore FROM CTE_DATA d JOIN CTE_STATS st ON d.[Type] = st.[Type] ) SELECT * , CASE WHEN zscore NOT BETWEEN -1 AND 1 THEN 'T' ELSE 'F' END AS is_anomaly FROM CTE_ZSCORES ORDER BY [Type], Id;
Id | Type | Price | zscore | is_anomaly -: | ---: | ----: | -----------------: | :--------- 1 | 1 | 100 | -0.797438205949334 | F 3 | 1 | 150 | -0.317053744534072 | F 4 | 1 | 300 | 1.12409963971171 | T 2 | 2 | 200 | -0.833333333333333 | F 5 | 2 | 200 | -0.833333333333333 | F 6 | 2 | 400 | 0.5 | F 7 | 2 | 500 | 1.16666666666667 | T
Or use window functions with partitioning.
SELECT * , CASE WHEN zscore NOT BETWEEN -1 AND 1 THEN 'T' ELSE 'F' END AS is_anomaly FROM ( SELECT Id, [Type], Price , zscore = (Price - AVG(Price) OVER (PARTITION BY [Type])) / STDEV(Price) OVER (PARTITION BY [Type]) FROM YourTable ) q ORDER BY [Type], Id
Id | Type | Price | zscore | is_anomaly -: | ---: | ----: | -----------------: | :--------- 1 | 1 | 100 | -0.797438205949334 | F 3 | 1 | 150 | -0.317053744534072 | F 4 | 1 | 300 | 1.12409963971171 | T 2 | 2 | 200 | -0.833333333333333 | F 5 | 2 | 200 | -0.833333333333333 | F 6 | 2 | 400 | 0.5 | F 7 | 2 | 500 | 1.16666666666667 | T
Test on db<>fiddle here
CodePudding user response:
You can do this with a single scan of the base table and without any joins, using window functions
SELECT
t.Id,
t.Type,
t.Price,
(Price - AVG(t.Price) OVER (PARTITION BY t.Type)) /
STDEV(t.Price) OVER (PARTITION BY t.Type) as zscore
FROM MyTable t