I am using the bsts package to analyze several time series, to find out whether the values in the series are increasing, decreasing or remaining stable along the time period.
I am still learning bayesian structural models, so forgive me for any imprecision. I see I can extract a trend component from the model, which looks like it is increasing (code below). But how can I formally test the direction of the trend? Should I remove the local linear trend and instead do a regression against time? Or is there a way to answer that question using the trend component already extracted?
Here is one of the time series:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 5.454450 5.351702 5.450414 5.490382 5.367216 5.404927 5.267974 5.211403 5.439326 5.394489 5.425333 5.413367
2014 5.365040 5.359957 5.388980 5.272279 5.337346 5.383114 5.211803 5.567671 5.446918 5.427461 5.490386 5.429216
2015 5.522443 5.432766 5.581413 5.307201 5.452850 5.426128 5.372678 5.524919 5.400167 5.453443 5.596288 5.424220
2016 5.633506 5.553917 5.553649 5.444297 5.551022 5.461216 5.503183 5.426033 5.454331 5.643385 5.575780 5.486073
2017 5.656180 5.411885 5.477615 5.437005 5.434144 5.404344 5.481005 5.437255 5.352800 5.485022 5.441534 5.472936
2018 5.366347 5.381583 5.401431 5.479976 5.439315 5.319484 5.421672 5.319448 5.574673 5.472161 5.479900 5.539380
2019 5.390139 5.429426 5.613977 5.343529 5.487940 5.624361 5.381285 5.366611 5.565688 5.503865 5.491821 5.486168
2020 5.475343 5.493866 6.010556 5.690947 5.656557 5.420500 5.453484 5.566972 5.369799 5.435967 5.613358 5.409345
And the code I used to extract the trend (from https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/), y is the time series above:
ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 12)
bsts.model <- bsts(y, state.specification = ss, niter = 500, ping=0, seed=2016)
burn <- SuggestBurn(0.1, bsts.model)
components <- cbind.data.frame(
colMeans(bsts.model$state.contributions[-(1:burn),"trend",]),
colMeans(bsts.model$state.contributions[-(1:burn),"seasonal.12.1",]),
as.Date(time(Y)))
names(components) <- c("Trend", "Seasonality", "Date")
components <- melt(components, id="Date")
names(components) <- c("Date", "Component", "Value")
ggplot(data=components, aes(x=Date, y=Value)) geom_line()
theme_bw() theme(legend.title = element_blank()) ylab("") xlab("")
facet_grid(Component ~ ., scales="free") guides(colour=FALSE)
theme(axis.text.x=element_text(angle = -90, hjust = 0))
CodePudding user response:
This can be done by inspecting the draws more closely.
Let's take this apart:
colMeans(bsts.model$state.contributions[-(1:burn), "trend",])
bsts.model$state.contributions
is an array of shape (iter, metric, obs)
.
bsts.model$state.contributions[-(1:burn), "trend", ]
Turns this into a (non_burnin_iter, obs)
matrix.
The call to colMeans
gets the average trend but you can extract any quantile or shares of properties as well:
m <- bsts.model$state.contributions[-(1:burn), "trend", ]
# Average trend, what you are getting now.
colMeans(m)
# Posterior probability that the effect of the local trend is positive
# Probably what you want though note that it's conceptually not the
# same as a p-value. It's more what people think 1-pvalue is.
colMeans(m > 0)
# Lower bound of the 95% credible interval of the trend.
# Useful if you have a region or practical equivalence (ROPE).
apply(m, 2, quantile, p = 0.025)
CodePudding user response:
I thought of a solution (code below), please let me know if it makes sense.
Basically, for every point in the time series, I :
check if the lower limit of the 95% confidence interval is bigger than the upper limit of any of the points before it.
check if the upper limit of the 95% confidence interval is smaller than the lower limit of any of the points before it.
This way, if the time series is stable, both 1 and 2 should be small. If it is increasing, 1 should be bigger than 2. If it is decreasing, 2 should be bigger than 1. If it is fluctuating, both 1 and 2 should be big.
Let's say, if the difference between 1 and 2 is less than 10% of the time series length, I consider it stable.
Any reason this could give me misleading results?
components <- cbind.data.frame(
apply(bsts.model$state.contributions[-(1:burn),"trend",], 2, quantile, p = 0.025),
apply(bsts.model$state.contributions[-(1:burn),"trend",], 2, quantile, p = 0.925))
names(components) <- c("Trend_lci", "Trend_uci")
test_increasing <-
sapply(1:nrow(components), function(i){
any(components$Trend_lci[i] > components$Trend_uci[1:i])
}) %>% sum
test_decreasing <-
sapply(1:nrow(components), function(i){
any(components$Trend_uci[i] < components$Trend_lci[1:i])
}) %>% sum
test_increasing - test_decreasing