I have a data set that will be used for time series. the date column is currently structured as follows:
> head(cam_shiller)
div stock dates
1 0.495 7.09 1933m1
2 0.490 6.25 1933m2
3 0.485 6.23 1933m3
4 0.480 6.89 1933m4
5 0.475 8.87 1933m5
6 0.470 10.39 1933m6
If I'm not mistaken, monthly data for time series should look like this: yyyy-mm. So I'm trying to make my date column look like this:
div stock dates
1 0.495 7.09 1933-01
2 0.490 6.25 1933-02
3 0.485 6.23 1933-03
4 0.480 6.89 1933-04
5 0.475 8.87 1933-05
6 0.470 10.39 1933-06
However, using the as.yearmo
function produces a column full of NAs. I tried removing the 'm' and replacing it with a dash, and then running as.yearmo
again. Now the results look like this:
div stock dates
1 0.495 7.09 Jan 1933
2 0.490 6.25 Feb 1933
3 0.485 6.23 Mar 1933
4 0.480 6.89 Apr 1933
5 0.475 8.87 May 1933
6 0.470 10.39 Jun 1933
How do I change the dates into the yyyy-mm format?
library(zoo)
cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')
cam_shiller$dates = gsub('m', '-', cam_shiller$dates)
cam_shiller$dates = as.yearmon(cam_shiller$dates)
CodePudding user response:
Try lubridate::ym
to change dates to yyyy-mm
format
library(tidyverse)
cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')
cam_shiller %>%
mutate(
date = lubridate::ym(dates),
date = strftime(date, "%Y-%m")
) %>%
head()
#> div stock dates date
#> 1 0.495 7.09 1933m1 1933-01
#> 2 0.490 6.25 1933m2 1933-02
#> 3 0.485 6.23 1933m3 1933-03
#> 4 0.480 6.89 1933m4 1933-04
#> 5 0.475 8.87 1933m5 1933-05
#> 6 0.470 10.39 1933m6 1933-06
Created on 2022-10-01 with reprex v2.0.2
CodePudding user response:
Actually, in ts
you just need to specify start=
and frequency
.
res <- ts(cam_shiller[, -3], start=1933, frequency=12)
res
# div stock
# Jan 1933 0.4950 7.09
# Feb 1933 0.4900 6.25
# Mar 1933 0.4850 6.23
# Apr 1933 0.4800 6.89
# May 1933 0.4750 8.87
# Jun 1933 0.4700 10.39
# Jul 1933 0.4650 11.23
# Aug 1933 0.4600 10.67
# Sep 1933 0.4550 10.58
# Oct 1933 0.4500 9.55
# Nov 1933 0.4450 9.78
# Dec 1933 0.4400 9.97
# Jan 1934 0.4408 10.54
# Feb 1934 0.4417 11.32
# Mar 1934 0.4425 10.74
# Apr 1934 0.4433 10.92
# May 1934 0.4442 9.81
# Jun 1934 0.4450 9.94
# Jul 1934 0.4458 9.47
# Aug 1934 0.4467 9.10
# Sep 1934 0.4475 8.88
# Oct 1934 0.4483 8.95
# Nov 1934 0.4492 9.20
# Dec 1934 0.4500 9.26
# ...
Or
ts(cam_shiller$stock, start=c(1933, 1), frequency=12)
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1933 7.09 6.25 6.23 6.89 8.87 10.39 11.23 10.67 10.58 9.55 9.78 9.97
# 1934 10.54 11.32 10.74 10.92 9.81 9.94 9.47 9.10 8.88 8.95 9.20 9.26
# 1935 9.26 8.98 8.41 9.04 9.75 10.12 10.65 11.37 11.61 11.92 13.04 13.04
# ...
It may be wise to check beforehand that there are no gaps in the data by evaluating the column and row variances of years and month matrices:
test <- do.call(rbind, strsplit(cam_shiller$dates, 'm')) |>
type.convert(as.is=TRUE)
matrixStats::colVars(matrix(test[, 1], 12))
# [1] 0 0 ...
matrixStats::rowVars(matrix(test[, 2], 12))
# [1] 0 0 0 0 0 0 0 0 0 0 0 0
If you use the xts::xts
, it's rather picky since it wants a time-based class such as "Date"
or "POSIXct"
. So you need whole dates, i.e. paste
a 01
as pseudo date.
res <- transform(cam_shiller, dates=strptime(paste(dates, '01'), format='%Ym%m %d')) |>
{\(.) xts::as.xts(.[1:2], .$dates)}()
head(res)
# div stock
# 1933-01-01 0.495 7.09
# 1933-02-01 0.490 6.25
# 1933-03-01 0.485 6.23
# 1933-04-01 0.480 6.89
# 1933-05-01 0.475 8.87
# 1933-06-01 0.470 10.39
class(res)
# [1] "xts" "zoo"
Data:
cam_shiller <- structure(list(div = c(0.495, 0.49, 0.485, 0.48, 0.475, 0.47,
0.465, 0.46, 0.455, 0.45, 0.445, 0.44, 0.4408, 0.4417, 0.4425,
0.4433, 0.4442, 0.445, 0.4458, 0.4467, 0.4475, 0.4483, 0.4492,
0.45), stock = c(7.09, 6.25, 6.23, 6.89, 8.87, 10.39, 11.23,
10.67, 10.58, 9.55, 9.78, 9.97, 10.54, 11.32, 10.74, 10.92, 9.81,
9.94, 9.47, 9.1, 8.88, 8.95, 9.2, 9.26), dates = c("1933m1",
"1933m2", "1933m3", "1933m4", "1933m5", "1933m6", "1933m7", "1933m8",
"1933m9", "1933m10", "1933m11", "1933m12", "1934m1", "1934m2",
"1934m3", "1934m4", "1934m5", "1934m6", "1934m7", "1934m8", "1934m9",
"1934m10", "1934m11", "1934m12")), row.names = c(NA, 24L), class = "data.frame")