Home > Blockchain >  Get records Z-Score based on their type
Get records Z-Score based on their type

Time:04-06

Please consider this records:

Id        Type       Price
---------------------------
 1         1          100
 2         2          200
 3         1          150
 4         1          300
 5         2          200
 6         2          400
 7         2          500

Based on this Article: Simple Anomaly Detection Using Plain SQL I want to calculate Z-Score based on every Type. I mean first I want to partition records based on Type column and then in every partition I calculate Z-Score. I wrote this query but I don't know how can I combine this with Partition function:

WITH series AS (
SELECT Price
FROM MyTable AS n
),
     stats AS (
     SELECT avg(Price) series_avg,
            stddev(Price) as series_stddev
FROM series
),
zscores AS (
              SELECT  Id, Type, Price, (price - series_avg) / series_stddev AS zscore
              FROM  series, stats
           )
SELECT *,case when zscore NOT BETWEEN -1 AND 1 then 'T' else 'F' end AS is_anomaly
FROM zscores;

Thanks

CodePudding user response:

When you calculate the AVG & STDEV simply group by the Type.

Then join the data to the summary on the Type.

WITH CTE_DATA AS (
  SELECT Id, [Type], Price
  FROM YourTable
)
, CTE_STATS AS (
  SELECT
    [Type]
  , AVG(Price) AS series_avg
  , STDEV(Price) AS series_stddev
  FROM CTE_DATA
  GROUP BY [Type]
)
, CTE_ZSCORES AS (
  SELECT d.Id, d.[Type], d.Price
  , (d.price - st.series_avg) / st.series_stddev AS zscore
  FROM CTE_DATA d
  JOIN CTE_STATS st ON d.[Type] = st.[Type]
)
SELECT *
, CASE 
  WHEN zscore NOT BETWEEN -1 AND 1 
  THEN 'T' 
  ELSE 'F' 
  END AS is_anomaly
FROM CTE_ZSCORES
ORDER BY [Type], Id;
Id | Type | Price |             zscore | is_anomaly
-: | ---: | ----: | -----------------: | :---------
 1 |    1 |   100 | -0.797438205949334 | F         
 3 |    1 |   150 | -0.317053744534072 | F         
 4 |    1 |   300 |   1.12409963971171 | T         
 2 |    2 |   200 | -0.833333333333333 | F         
 5 |    2 |   200 | -0.833333333333333 | F         
 6 |    2 |   400 |                0.5 | F         
 7 |    2 |   500 |   1.16666666666667 | T         

Or use window functions with partitioning.

SELECT *
, CASE 
  WHEN zscore NOT BETWEEN -1 AND 1 
  THEN 'T' 
  ELSE 'F' 
  END AS is_anomaly
FROM
(
  SELECT Id, [Type], Price
  , zscore = (Price - AVG(Price) OVER (PARTITION BY [Type])) / STDEV(Price) OVER (PARTITION BY [Type])
  FROM YourTable
) q
ORDER BY [Type], Id
Id | Type | Price |             zscore | is_anomaly
-: | ---: | ----: | -----------------: | :---------
 1 |    1 |   100 | -0.797438205949334 | F         
 3 |    1 |   150 | -0.317053744534072 | F         
 4 |    1 |   300 |   1.12409963971171 | T         
 2 |    2 |   200 | -0.833333333333333 | F         
 5 |    2 |   200 | -0.833333333333333 | F         
 6 |    2 |   400 |                0.5 | F         
 7 |    2 |   500 |   1.16666666666667 | T         

Test on db<>fiddle here

CodePudding user response:

You can do this with a single scan of the base table and without any joins, using window functions

SELECT
  t.Id,
  t.Type,
  t.Price,
  (Price - AVG(t.Price) OVER (PARTITION BY t.Type)) /
    STDEV(t.Price) OVER (PARTITION BY t.Type) as zscore
FROM MyTable t

db<>fiddle

  • Related