I'm working to build a multivariate time series to make predictions about labor in the United States. The fpp3 package is excellent, but I don't see a notation to model all the variables.
For example, in linear regression, it's possible to do this:
library(tidyverse)
mtcars.lm <- lm(mpg ~ ., data = mtcars)
summary(mtcars.lm)
to model mpg on all the remaining variables, without having to write all the variables out explicity. Is there something similar in time series using the fpp3 package?
For example, this returns an error:
library(tidyverse)
library(fpp3)
library(clock)
# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0000000001
All_Employees <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/All_Employees.csv', col_select = c(Label, Value), show_col_types = FALSE)
All_Employees <- All_Employees %>%
rename(Month = Label, Total_Employees = Value)
All_Employees <- All_Employees %>%
mutate(Month = yearmonth(Month)) %>%
as_tsibble(index = Month) %>%
mutate(Total_Employees_Diff = difference(Total_Employees))
index = All_Employees$Month
All_Employees <- All_Employees %>%
filter((Month >= start_month), (Month <= end_month))
# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0500000003
Average_Hourly_Earnings <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/Average_Hourly_Earnings.csv', col_select = c(Label, Value), show_col_types = FALSE)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>%
rename(Month = Label, Avg_Hourly_Earnings = Value)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>%
mutate(Month = yearmonth(Month)) %>%
as_tsibble(index = Month) %>%
mutate(Avg_Hourly_Earnings_Diff = difference(Avg_Hourly_Earnings))
Average_Hourly_Earnings <- Average_Hourly_Earnings %>%
filter((Month >= start_month), (Month <= end_month))
Monthly_labor_data_small <-
tsibble(
Month = All_Employees$Month,
index = Month,
'Total_Employees' = All_Employees$Total_Employees,
'Avg_Earnings' = Average_Hourly_Earnings$Avg_Hourly_Earnings
)
start_month_small = yearmonth("2020 Mar")
end_month_small = yearmonth("2022 Jan")
Monthly_labor_data_small <- Monthly_labor_data_small %>%
filter((Month >= start_month_small), (Month <= end_month_small))
Monthly_labor_data_small %>%
model(
linear = TSLM(Total_Employees ~ .,))
The error is: Error in TSLM(Total_Employees ~ ., ) : unused argument (alist())
But this runs fine if I list everything out:
fit <- Monthly_labor_data_small %>%
model(
linear = TSLM(Total_Employees ~ Avg_Earnings season() trend()))
report(fit)
The full tsibble will have a large number of columns, is there a short way to list all of them, similar to what can be done in linear regression?
CodePudding user response:
You should be able to do something like
resp <- "Total_Employees"
form <- reformulate(response = resp,
c(setdiff(names(Monthly_labor_data_small), resp),
"season()", "trend()"))
And then use form
in your model. I haven't tried your examples -- if there are other variables (like a time index) that should not be explicitly included in the model then the second argument to setdiff()
should be c(resp, "excluded_var2", "excluded_var3")