I have data where I have the dates in YYYY-MM-DD format in one column and another column is num.
packages:
library(forecast)
library(ggplot2)
library(readr)
Running str(my_data)
produces the following:
spec_tbl_df [261 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ date : Date[1:261], format: "2017-01-01" "2017-01-08" ...
$ popularity: num [1:261] 100 81 79 75 80 80 71 85 79 81 ...
- attr(*, "spec")=
.. cols(
.. date = col_date(format = ""),
.. popularity = col_double()
.. )
- attr(*, "problems")=<externalptr>
I would like to do some time series analysis on this. When running the first line of code for this decomp <- stl(log(my_data), s.window="periodic")
I keep running into the following error:
Error in Math.data.frame(my_data) :
non-numeric-alike variable(s) in data frame: date
Originally my date format was in MM/DD/YYYY format, so I feel like I'm... barely closer. I'm learning R again, but it's been a while since I took a formal course in it. I did a precursory search here, but could not find anything that I could identify as helpful (I'm just an amateur.)
CodePudding user response:
You currently have a data.frame
(or tibble
variant thereof). That is not yet time aware. You can do things like
library(ggplot2)
ggplot(data=df) aes(x=date, y=popularity) geom_line()
to get a basic line plot properly index by date.
You will have to look more closely at package forecast
and the examples of functions you want to use to predict or model. Packages like xts
can help you, i.e.
library(xts)
x <- xts(df$popularity, order.by=df$date)
plot(x) # plot xts object
besides plotting you get time- and date aware lags and leads and subsetting. The rest depends more on what you want to do ... which you have not told us much about.
Lastly, if you wanted to convert your dates to numbers (since Jan 1, 1970) a quick as.numeric(df$date))
will; but using time-aware operations is often better (but has the learning curve you see now...)