change date to yyyy-mm in R-CodePudding

I have a data set that will be used for time series. the date column is currently structured as follows:

> head(cam_shiller)
    div stock  dates
1 0.495  7.09 1933m1
2 0.490  6.25 1933m2
3 0.485  6.23 1933m3
4 0.480  6.89 1933m4
5 0.475  8.87 1933m5
6 0.470 10.39 1933m6

If I'm not mistaken, monthly data for time series should look like this: yyyy-mm. So I'm trying to make my date column look like this:

    div stock  dates
1 0.495  7.09 1933-01
2 0.490  6.25 1933-02
3 0.485  6.23 1933-03
4 0.480  6.89 1933-04
5 0.475  8.87 1933-05
6 0.470 10.39 1933-06

However, using the as.yearmo function produces a column full of NAs. I tried removing the 'm' and replacing it with a dash, and then running as.yearmo again. Now the results look like this:

    div stock    dates
1 0.495  7.09 Jan 1933
2 0.490  6.25 Feb 1933
3 0.485  6.23 Mar 1933
4 0.480  6.89 Apr 1933
5 0.475  8.87 May 1933
6 0.470 10.39 Jun 1933

How do I change the dates into the yyyy-mm format?

library(zoo)

cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')

cam_shiller$dates = gsub('m', '-', cam_shiller$dates)
cam_shiller$dates = as.yearmon(cam_shiller$dates)

CodePudding user response：

Try lubridate::ym to change dates to yyyy-mm format

library(tidyverse)

cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')

cam_shiller %>% 
  mutate(
    date = lubridate::ym(dates),
    date = strftime(date, "%Y-%m")
  ) %>% 
  head()

#>     div stock  dates    date
#> 1 0.495  7.09 1933m1 1933-01
#> 2 0.490  6.25 1933m2 1933-02
#> 3 0.485  6.23 1933m3 1933-03
#> 4 0.480  6.89 1933m4 1933-04
#> 5 0.475  8.87 1933m5 1933-05
#> 6 0.470 10.39 1933m6 1933-06

^{Created on 2022-10-01 with reprex v2.0.2}

CodePudding user response：

Actually, in ts you just need to specify start= and frequency.

res <- ts(cam_shiller[, -3], start=1933, frequency=12)
res
#             div stock
# Jan 1933 0.4950  7.09
# Feb 1933 0.4900  6.25
# Mar 1933 0.4850  6.23
# Apr 1933 0.4800  6.89
# May 1933 0.4750  8.87
# Jun 1933 0.4700 10.39
# Jul 1933 0.4650 11.23
# Aug 1933 0.4600 10.67
# Sep 1933 0.4550 10.58
# Oct 1933 0.4500  9.55
# Nov 1933 0.4450  9.78
# Dec 1933 0.4400  9.97
# Jan 1934 0.4408 10.54
# Feb 1934 0.4417 11.32
# Mar 1934 0.4425 10.74
# Apr 1934 0.4433 10.92
# May 1934 0.4442  9.81
# Jun 1934 0.4450  9.94
# Jul 1934 0.4458  9.47
# Aug 1934 0.4467  9.10
# Sep 1934 0.4475  8.88
# Oct 1934 0.4483  8.95
# Nov 1934 0.4492  9.20
# Dec 1934 0.4500  9.26
# ...

ts(cam_shiller$stock, start=c(1933, 1), frequency=12)
#  Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
# 1933   7.09   6.25   6.23   6.89   8.87  10.39  11.23  10.67  10.58   9.55   9.78   9.97
# 1934  10.54  11.32  10.74  10.92   9.81   9.94   9.47   9.10   8.88   8.95   9.20   9.26
# 1935   9.26   8.98   8.41   9.04   9.75  10.12  10.65  11.37  11.61  11.92  13.04  13.04
# ...

It may be wise to check beforehand that there are no gaps in the data by evaluating the column and row variances of years and month matrices:

test <- do.call(rbind, strsplit(cam_shiller$dates, 'm')) |>
  type.convert(as.is=TRUE) 
matrixStats::colVars(matrix(test[, 1], 12))
#  [1] 0 0 ...
matrixStats::rowVars(matrix(test[, 2], 12))
# [1] 0 0 0 0 0 0 0 0 0 0 0 0

If you use the xts::xts, it's rather picky since it wants a time-based class such as "Date" or "POSIXct". So you need whole dates, i.e. paste a 01 as pseudo date.

res <- transform(cam_shiller, dates=strptime(paste(dates, '01'), format='%Ym%m %d')) |>
  {\(.) xts::as.xts(.[1:2], .$dates)}()
head(res)
#              div stock
# 1933-01-01 0.495  7.09
# 1933-02-01 0.490  6.25
# 1933-03-01 0.485  6.23
# 1933-04-01 0.480  6.89
# 1933-05-01 0.475  8.87
# 1933-06-01 0.470 10.39

class(res)
# [1] "xts" "zoo"

Data:

cam_shiller <- structure(list(div = c(0.495, 0.49, 0.485, 0.48, 0.475, 0.47, 
0.465, 0.46, 0.455, 0.45, 0.445, 0.44, 0.4408, 0.4417, 0.4425, 
0.4433, 0.4442, 0.445, 0.4458, 0.4467, 0.4475, 0.4483, 0.4492, 
0.45), stock = c(7.09, 6.25, 6.23, 6.89, 8.87, 10.39, 11.23, 
10.67, 10.58, 9.55, 9.78, 9.97, 10.54, 11.32, 10.74, 10.92, 9.81, 
9.94, 9.47, 9.1, 8.88, 8.95, 9.2, 9.26), dates = c("1933m1", 
"1933m2", "1933m3", "1933m4", "1933m5", "1933m6", "1933m7", "1933m8", 
"1933m9", "1933m10", "1933m11", "1933m12", "1934m1", "1934m2", 
"1934m3", "1934m4", "1934m5", "1934m6", "1934m7", "1934m8", "1934m9", 
"1934m10", "1934m11", "1934m12")), row.names = c(NA, 24L), class = "data.frame")