Home > Software design >  Convert ts object to dataframe with Date/POSIXct column (base R)
Convert ts object to dataframe with Date/POSIXct column (base R)

Time:12-11

I have a time series that I want to transform to a dataframe with column(s) containing the values and one Date/ POSIXct column with the ttime. Reproducible example:

data_ts <- ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)

Expected output is:

data_df <- data.frame(Series_1= 1:7, Series_2= 8:14, Date= seq(as.Date("2000/1/1"), length.out= nrow(data_ts)*2, by= "month")[seq(1, nrow(data_ts)*2, 2)])

data_df
Series_1 Series_2       Date
1        1        8 2000-01-01
2        2        9 2000-03-01
3        3       10 2000-05-01
4        4       11 2000-07-01
5        5       12 2000-09-01
6        6       13 2000-11-01
7        7       14 2001-01-01

In this example the frequency is 6 and start is at c(2000, 1) but in my use case I do not know those parameters in beforehand. In fact, the frequency could also be so big that I refers to hours/ minutes or seconds (frequency= 365*24*60*60, for example). If frequency is in years/ month/ weeks/ days class Date is enough but for hours/ minutes/ seconds the column must changes, of course (to POSIXct). So I am trying a general solution. I want it to be base R.

There is a similar question but the answer has a column with dates as decimal numbers. Instead I need a actual Date/PSIXct column.

CodePudding user response:

I came up with an approach which uses the seq function if the frequency is in full month, weeks or days, choosing the corresponding by argument ("month", "week" or "day"). For cases where frequency is not full month, week or day I calculated to how many seconds the frequency corresponds. For example, a frequency of 365*24 means that we need to go in 60*60=3600 second steps (because 365*24 means hours and 60*60 second steps are hour steps), and so on.

This all gives the following rather long solution, but it works in all cases I've tried so far:

ts_to_df <- function(data){
ts_freq <- frequency(data)
# Works for univariate and multivariate time series:
if(is.null(dim(data))){
ts_length <- length(data)
} else{
ts_length <- nrow(data)
}

# For full month.
if(ts_freq %in% c(1, 2, 3, 4, 6, 12)){
n_months <- 12/ ts_freq
first_month <- as.Date(paste(start(data)[1], 
start(data)[2]*n_months-(n_months-1), "1", sep= "/"))
ts_date <- seq(first_month, length.out= ts_length*n_months, by= "month")
ts_date <- ts_date[seq(1, length(ts_date), by= n_months)]

# For full weeks.
} else if(ts_freq == 52){
first_week <- seq(as.Date(paste(start(data)[1], "/1/1", sep="")), length.out= start(data)[2], by= "week")[start(data)[2]]
ts_date <- seq(first_week, length.out= ts_length, by= "week")

# For full days.
} else if(ts_freq == 365 | ts_freq == 366){
first_day <- as.Date(paste(start(data)[1], "/1/", start(data)[2], sep=""))
ts_date <- seq(first_day, length.out= ts_length, by= "day")

# All other cases.
} else{
sec_year <- 365*24*60*60
freq_fraction <- 1/ts_freq
sec_steps <- sec_year*freq_fraction
first_sec <- as.POSIXct(paste(start(data)[1], "-01-01 00:00:01", sep=""))   sec_steps*start(data)[2] - sec_steps
ts_date <- first_sec   sec_steps *0:(ts_length-1)
}

# Make a data.frame.
data_df <- data.frame(as.matrix(data))
data_df$date <- ts_date
data_df
}

Applying the function to a ts with weeks as frequency:

ts_to_df(ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 52))
Series.1 Series.2       date
1        1        8 2000-01-01
2        2        9 2000-01-08
3        3       10 2000-01-15
4        4       11 2000-01-22
5        5       12 2000-01-29
6        6       13 2000-02-05
7        7       14 2000-02-12

Now to a ts that has minutes as frequency and start from the 10th minute of 2009:

ts_to_df(ts(matrix(1:14, ncol= 2), start= c(2009, 10), frequency= 365*24*60))
Series.1 Series.2                date
1        1        8 2009-01-01 00:09:01
2        2        9 2009-01-01 00:10:01
3        3       10 2009-01-01 00:11:01
4        4       11 2009-01-01 00:12:01
5        5       12 2009-01-01 00:13:01
6        6       13 2009-01-01 00:14:01
7        7       14 2009-01-01 00:15:01

And so on...

CodePudding user response:

I think easiest way is with tsbox and base R data.frames.

library(tsbox)
data_ts = ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)

ts_df = ts_df(data_ts)
ts_df = dcast(ts_df, time~id, value.var=value)

Output:

> ts_dt
         time Series 1 Series 2
1: 2000-01-01        1        8
2: 2000-03-01        2        9
3: 2000-05-01        3       10
4: 2000-07-01        4       11
5: 2000-09-01        5       12
6: 2000-11-01        6       13
7: 2001-01-01        7       14

If you would use just base R I don't know a very handy solution, but there are better R coders in this community than me and maybe one knows a solution without dozens of rows.

Update:

An attempt just with base R. If having months:

data_ts = ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)

df_ts = data.frame(data_ts)

df_ts$date = as.numeric(time(data_ts))
df_ts$date = as.Date(paste0(floor(df_ts$date), "-", 
                            sprintf("d", 1 round((df_ts$date-floor(df_ts$date))*12)), "-01"))

Output:

> df_ts
  Series.1 Series.2       date
1        1        8 2000-01-01
2        2        9 2000-03-01
3        3       10 2000-05-01
4        4       11 2000-07-01
5        5       12 2000-09-01
6        6       13 2000-11-01
7        7       14 2001-01-01

If having weeks:

df_ts = data.frame(data_ts)

df_ts$week = as.numeric(time(data_ts))
df_ts$week = paste0(floor(df_ts$week), "-", 
                            sprintf("d", 1 round((df_ts$week-floor(df_ts$week))*52)))

Output:

   Series.1 Series.2    week
1         1       15 2000-01
2         2       16 2000-02
3         3       17 2000-03
4         4       18 2000-04
5         5       19 2000-05
6         6       20 2000-06
7         7       21 2000-07
8         8       22 2000-08
9         9       23 2000-09
10       10       24 2000-10
11       11       25 2000-11
12       12       26 2000-12
13       13       27 2000-13
14       14       28 2000-14

If having days:

data_ts = ts(matrix(1:20, ncol= 2), start= c(2000, 1, 1), frequency= 365)

df_ts = data.frame(data_ts)

df_ts$date = time(data_ts)
df_ts$date = as.Date(paste0(floor(df_ts$date), 
                            "-", 
                            sprintf("d", 1 round((df_ts$date-floor(df_ts$date))*365)), 
                            "-01"), "%Y-%j")

Output:

> df_ts
   Series.1 Series.2       date
1         1       11 2000-01-01
2         2       12 2000-01-02
3         3       13 2000-01-03
4         4       14 2000-01-04
5         5       15 2000-01-05
6         6       16 2000-01-06
7         7       17 2000-01-07
8         8       18 2000-01-08
9         9       19 2000-01-09
10       10       20 2000-01-10
  • Related