I build a VERY simple model with only a time series and a data field of always 1 to find anomalies.
CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='cnt',
TIME_SERIES_TIMESTAMP_COL='ts',
DATA_FREQUENCY='HOURLY',
DECOMPOSE_TIME_SERIES=TRUE
)
AS
select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;
Model works fine unless I use a custom select
query to find anomalies. Even if the query is the exact same that was used to create the model.
SELECT *
FROM ML.DETECT_ANOMALIES(
MODEL `mytest.dummy`,
STRUCT (0.9 AS anomaly_prob_threshold),
(select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts)
)
Result
Row | ts | cnt | is_anomaly | lower_bound | upper_bound | anomaly_probability |
---|---|---|---|---|---|---|
1 | 2022-05-01 00:00:00 UTC | 1.0 | null | null | null | null |
2 | 2022-05-01 01:00:00 UTC | 1.0 | null | null | null | null |
3 | .... |
Does anyone know what I need to do to get expected results of is_anomaly = false
.
CodePudding user response:
After a closer look into the documentation I found out that anomaly detection works only outside of the training range - at least for new queries and only as far as as the HORIZON
goes (by the time of writing the default is 1.000).
Historical data can be also classified, but only without a query and only if the parameter DECOMPOSE_TIME_SERIES
is set to true
.
The example above would look like this:
CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='cnt',
TIME_SERIES_TIMESTAMP_COL='ts'
)
AS
select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;
The query with the next days
SELECT *
FROM ML.DETECT_ANOMALIES(
MODEL `mytest.dummy`,
STRUCT (0.9 AS anomaly_prob_threshold),
(select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-03', '2022-05-04', INTERVAL 1 HOUR)) as ts)
)
Result
Row | ts | cnt | is_anomaly | lower_bound | upper_bound | anomaly_probability |
---|---|---|---|---|---|---|
1 | 2022-05-03 00:00:00 UTC | 1.0 | false | 1.0 | 1.0 | 0.0 |
2 | 2022-05-04 01:00:00 UTC | 1.0 | false | 1.0 | 1.0 | 0.0 |
3 | .... |