s
X Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 2012 24.78 26.82 29.75 31.19 31.87 31.00 28.48 27.39 27.08 26.55 24.36 23.62
2 2013 24.82 26.04 28.83 30.85 32.44 29.70 27.86 27.66 27.73 27.00 24.87 22.94
3 2014 24.01 25.75 29.08 31.83 31.23 33.08 29.88 28.14 27.40 27.11 25.38 24.37
4 2015 24.60 26.11 29.19 30.71 32.69 28.90 29.21 28.24 27.58 27.82 25.37 24.71
5 2016 25.20 27.62 29.51 31.86 32.34 28.64 27.98 28.36 27.12 26.51 25.69 25.12
6 2017 25.28 26.88 29.55 31.88 32.74 29.89 28.41 27.60 27.72 27.23 25.43 24.08
7 2018 24.84 26.47 29.40 31.20 31.10 30.28 28.30 27.33 27.55 27.40 26.98 24.77
8 2019 23.73 26.75 29.57 31.59 32.53 31.30 29.48 27.78 27.54 27.05 25.44 24.46
9 2020 25.41 26.75 29.30 31.37 32.98 30.05 28.23 27.53 27.68 27.01 25.57 22.86
10 2021 24.70 25.90 29.62 31.42 31.68 30.17 28.13 28.08 27.68 27.29 25.59 23.16
How to convert this into time series for forecasting?
CodePudding user response:
You can use the pivot_longer()
function from the tidyr
package to convert this into a longer format. Then the ts()
function can covert it to a timeseries.
# recreate the original data
data1 <- structure(list(X=c(2012,2013,2014,2015,2016,2017,2018,2019,2020,2021),
Jan=c(24.78,24.82,24.01,24.6,25.2,25.28,24.84,23.73,25.41,24.7),
Feb=c(26.82,26.04,25.75,26.11,27.62,26.88,26.47,26.75,26.75,25.9),
Mar=c(29.75,28.83,29.08,29.19,29.51,29.55,29.4,29.57,29.3,29.62),
Apr=c(31.19,30.85,31.83,30.71,31.86,31.88,31.2,31.59,31.37,31.42),
May=c(31.87,32.44,31.23,32.69,32.34,32.74,31.1,32.53,32.98,31.68),
Jun=c(31,29.7,33.08,28.9,28.64,29.89,30.28,31.3,30.05,30.17),
Jul=c(28.48,27.86,29.88,29.21,27.98,28.41,28.3,29.48,28.23,28.13),
Aug=c(27.39,27.66,28.14,28.24,28.36,27.6,27.33,27.78,27.53,28.08),
Sep=c(27.08,27.73,27.4,27.58,27.12,27.72,27.55,27.54,27.68,27.68),
Oc=c(26.55,27,27.11,27.82,26.51,27.23,27.4,27.05,27.01,27.29),
Nov=c(24.36,24.87,25.38,25.37,25.69,25.43,26.98,25.44,25.57,25.59),
Dec=c(23.62,22.94,24.37,24.71,25.12,24.08,24.77,24.46,22.86,23.16)),
row.names=c(NA,-10L),
class=c("tbl_df","tbl","data.frame"))
# pivot to longer format
library(tidyr)
data2 <- pivot_longer(data1,-X,values_to='value')
# convert to monthly timeseries starting at Jan 2012 ending at Dec 2021
timeseries <- ts(data2$value,start=2012,end=2021 11/12,frequency=12)
CodePudding user response:
We the question is how to convert a data frame in the form of the data shown in the Note at the end to a ts object. In particular we assume that the only NA's are at the beginning in case it does not start in January or at the end if it does not end in December.
No after removing the year column transpose it using t
, unravel that into a vector using c
and then specify the appropriate start year and frequency. Finally we assume that if it does not start in January that it starts with NA's so remove them with na.omit
-- if we knew it starts in January and ends in December we could optionally remove the na.omit
. No packages are used.
(If there are NA's at the beginning and/or end the above will continue to work but if there are also NA's internally then use na.trim
from zoo in place of na.omit
.)
na.omit(ts(c(t(s[, -1])), start = s[1, 1], frequency = 12))
Note
s <- structure(list(X = 2012:2021, Jan = c(24.78, 24.82, 24.01, 24.6,
25.2, 25.28, 24.84, 23.73, 25.41, 24.7), Feb = c(26.82, 26.04,
25.75, 26.11, 27.62, 26.88, 26.47, 26.75, 26.75, 25.9), Mar = c(29.75,
28.83, 29.08, 29.19, 29.51, 29.55, 29.4, 29.57, 29.3, 29.62),
Apr = c(31.19, 30.85, 31.83, 30.71, 31.86, 31.88, 31.2, 31.59,
31.37, 31.42), May = c(31.87, 32.44, 31.23, 32.69, 32.34,
32.74, 31.1, 32.53, 32.98, 31.68), Jun = c(31, 29.7, 33.08,
28.9, 28.64, 29.89, 30.28, 31.3, 30.05, 30.17), Jul = c(28.48,
27.86, 29.88, 29.21, 27.98, 28.41, 28.3, 29.48, 28.23, 28.13
), Aug = c(27.39, 27.66, 28.14, 28.24, 28.36, 27.6, 27.33,
27.78, 27.53, 28.08), Sep = c(27.08, 27.73, 27.4, 27.58,
27.12, 27.72, 27.55, 27.54, 27.68, 27.68), Oct = c(26.55,
27, 27.11, 27.82, 26.51, 27.23, 27.4, 27.05, 27.01, 27.29
), Nov = c(24.36, 24.87, 25.38, 25.37, 25.69, 25.43, 26.98,
25.44, 25.57, 25.59), Dec = c(23.62, 22.94, 24.37, 24.71,
25.12, 24.08, 24.77, 24.46, 22.86, 23.16)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))