Home > Net >  BigQuery ML.DETECT_ANOMALIES with model 'arima_plus' returns only nulls
BigQuery ML.DETECT_ANOMALIES with model 'arima_plus' returns only nulls

Time:05-23

I build a VERY simple model with only a time series and a data field of always 1 to find anomalies.

CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
    model_type='arima_plus',
    TIME_SERIES_DATA_COL='cnt',
    TIME_SERIES_TIMESTAMP_COL='ts',
    DATA_FREQUENCY='HOURLY',
    DECOMPOSE_TIME_SERIES=TRUE
)
AS
select ts, 1 cnt 
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;

Model works fine unless I use a custom select query to find anomalies. Even if the query is the exact same that was used to create the model.

SELECT * 
FROM ML.DETECT_ANOMALIES(
    MODEL `mytest.dummy`,
    STRUCT (0.9 AS anomaly_prob_threshold),
    (select ts, 1 cnt 
     from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts)
)

Result

Row ts cnt is_anomaly lower_bound upper_bound anomaly_probability
1 2022-05-01 00:00:00 UTC 1.0 null null null null
2 2022-05-01 01:00:00 UTC 1.0 null null null null
3 ....

Does anyone know what I need to do to get expected results of is_anomaly = false.

CodePudding user response:

After a closer look into the documentation I found out that anomaly detection works only outside of the training range - at least for new queries and only as far as as the HORIZONgoes (by the time of writing the default is 1.000).

Historical data can be also classified, but only without a query and only if the parameter DECOMPOSE_TIME_SERIES is set to true.

The example above would look like this:

CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
    model_type='arima_plus',
    TIME_SERIES_DATA_COL='cnt',
    TIME_SERIES_TIMESTAMP_COL='ts'
)
AS
select ts, 1 cnt 
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;

The query with the next days

SELECT * 
FROM ML.DETECT_ANOMALIES(
    MODEL `mytest.dummy`,
    STRUCT (0.9 AS anomaly_prob_threshold),
    (select ts, 1 cnt 
     from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-03', '2022-05-04', INTERVAL 1 HOUR)) as ts)
)

Result

Row ts cnt is_anomaly lower_bound upper_bound anomaly_probability
1 2022-05-03 00:00:00 UTC 1.0 false 1.0 1.0 0.0
2 2022-05-04 01:00:00 UTC 1.0 false 1.0 1.0 0.0
3 ....
  • Related