Home > Back-end >  Multivariate time series - is there notation to select all the variables, or do they all have to be
Multivariate time series - is there notation to select all the variables, or do they all have to be

Time:07-04

I'm working to build a multivariate time series to make predictions about labor in the United States. The fpp3 package is excellent, but I don't see a notation to model all the variables.

For example, in linear regression, it's possible to do this:

library(tidyverse)
mtcars.lm <-  lm(mpg ~ ., data = mtcars)
summary(mtcars.lm)

to model mpg on all the remaining variables, without having to write all the variables out explicity. Is there something similar in time series using the fpp3 package?

For example, this returns an error:

library(tidyverse)
library(fpp3)
library(clock)

# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0000000001
All_Employees <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/All_Employees.csv', col_select = c(Label, Value), show_col_types = FALSE)
All_Employees <- All_Employees %>%
  rename(Month = Label, Total_Employees = Value)
All_Employees <- All_Employees %>%
  mutate(Month = yearmonth(Month)) %>% 
  as_tsibble(index = Month) %>% 
  mutate(Total_Employees_Diff = difference(Total_Employees))

index = All_Employees$Month

All_Employees <- All_Employees %>% 
  filter((Month >= start_month), (Month <= end_month))

# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0500000003
Average_Hourly_Earnings <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/Average_Hourly_Earnings.csv', col_select = c(Label, Value), show_col_types = FALSE)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>%
  rename(Month = Label, Avg_Hourly_Earnings = Value)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>% 
  mutate(Month = yearmonth(Month)) %>% 
  as_tsibble(index = Month) %>% 
  mutate(Avg_Hourly_Earnings_Diff = difference(Avg_Hourly_Earnings))

Average_Hourly_Earnings <- Average_Hourly_Earnings %>% 
  filter((Month >= start_month), (Month <= end_month))

Monthly_labor_data_small <- 
  tsibble(
    Month = All_Employees$Month,
    index = Month,
    'Total_Employees' = All_Employees$Total_Employees,
    'Avg_Earnings' = Average_Hourly_Earnings$Avg_Hourly_Earnings
  )

start_month_small = yearmonth("2020 Mar")
end_month_small = yearmonth("2022 Jan")  

Monthly_labor_data_small <- Monthly_labor_data_small %>% 
  filter((Month >= start_month_small), (Month <= end_month_small))


Monthly_labor_data_small %>% 
  model(
  linear = TSLM(Total_Employees ~ .,))

The error is: Error in TSLM(Total_Employees ~ ., ) : unused argument (alist())

But this runs fine if I list everything out:

fit <- Monthly_labor_data_small %>% 
  model(
  linear = TSLM(Total_Employees ~ Avg_Earnings   season()   trend()))

report(fit)

The full tsibble will have a large number of columns, is there a short way to list all of them, similar to what can be done in linear regression?

CodePudding user response:

You should be able to do something like

resp <- "Total_Employees"
form <- reformulate(response = resp,
   c(setdiff(names(Monthly_labor_data_small), resp),
    "season()", "trend()"))

And then use form in your model. I haven't tried your examples -- if there are other variables (like a time index) that should not be explicitly included in the model then the second argument to setdiff() should be c(resp, "excluded_var2", "excluded_var3")

  • Related