I have a time series that I want to transform to a dataframe with column(s) containing the values and one Date/ POSIXct column with the ttime. Reproducible example:
data_ts <- ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)
Expected output is:
data_df <- data.frame(Series_1= 1:7, Series_2= 8:14, Date= seq(as.Date("2000/1/1"), length.out= nrow(data_ts)*2, by= "month")[seq(1, nrow(data_ts)*2, 2)])
data_df
Series_1 Series_2 Date
1 1 8 2000-01-01
2 2 9 2000-03-01
3 3 10 2000-05-01
4 4 11 2000-07-01
5 5 12 2000-09-01
6 6 13 2000-11-01
7 7 14 2001-01-01
In this example the frequency
is 6
and start is at c(2000, 1)
but in my use case I do not know those parameters in beforehand. In fact, the frequency
could also be so big that I refers to hours/ minutes or seconds (frequency= 365*24*60*60
, for example). If frequency is in years/ month/ weeks/ days class Date
is enough but for hours/ minutes/ seconds the column must changes, of course (to POSIXct
). So I am trying a general solution. I want it to be base R.
There is a similar question but the answer has a column with dates as decimal numbers. Instead I need a actual Date/PSIXct column.
CodePudding user response:
I came up with an approach which uses the seq
function if the frequency is in full month, weeks or days, choosing the corresponding by
argument ("month"
, "week"
or "day"
). For cases where frequency is not full month, week or day I calculated to how many seconds the frequency corresponds. For example, a frequency of 365*24
means that we need to go in 60*60=3600
second steps (because 365*24
means hours and 60*60
second steps are hour steps), and so on.
This all gives the following rather long solution, but it works in all cases I've tried so far:
ts_to_df <- function(data){
ts_freq <- frequency(data)
# Works for univariate and multivariate time series:
if(is.null(dim(data))){
ts_length <- length(data)
} else{
ts_length <- nrow(data)
}
# For full month.
if(ts_freq %in% c(1, 2, 3, 4, 6, 12)){
n_months <- 12/ ts_freq
first_month <- as.Date(paste(start(data)[1],
start(data)[2]*n_months-(n_months-1), "1", sep= "/"))
ts_date <- seq(first_month, length.out= ts_length*n_months, by= "month")
ts_date <- ts_date[seq(1, length(ts_date), by= n_months)]
# For full weeks.
} else if(ts_freq == 52){
first_week <- seq(as.Date(paste(start(data)[1], "/1/1", sep="")), length.out= start(data)[2], by= "week")[start(data)[2]]
ts_date <- seq(first_week, length.out= ts_length, by= "week")
# For full days.
} else if(ts_freq == 365 | ts_freq == 366){
first_day <- as.Date(paste(start(data)[1], "/1/", start(data)[2], sep=""))
ts_date <- seq(first_day, length.out= ts_length, by= "day")
# All other cases.
} else{
sec_year <- 365*24*60*60
freq_fraction <- 1/ts_freq
sec_steps <- sec_year*freq_fraction
first_sec <- as.POSIXct(paste(start(data)[1], "-01-01 00:00:01", sep="")) sec_steps*start(data)[2] - sec_steps
ts_date <- first_sec sec_steps *0:(ts_length-1)
}
# Make a data.frame.
data_df <- data.frame(as.matrix(data))
data_df$date <- ts_date
data_df
}
Applying the function to a ts
with weeks as frequency:
ts_to_df(ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 52))
Series.1 Series.2 date
1 1 8 2000-01-01
2 2 9 2000-01-08
3 3 10 2000-01-15
4 4 11 2000-01-22
5 5 12 2000-01-29
6 6 13 2000-02-05
7 7 14 2000-02-12
Now to a ts
that has minutes as frequency and start from the 10th minute of 2009:
ts_to_df(ts(matrix(1:14, ncol= 2), start= c(2009, 10), frequency= 365*24*60))
Series.1 Series.2 date
1 1 8 2009-01-01 00:09:01
2 2 9 2009-01-01 00:10:01
3 3 10 2009-01-01 00:11:01
4 4 11 2009-01-01 00:12:01
5 5 12 2009-01-01 00:13:01
6 6 13 2009-01-01 00:14:01
7 7 14 2009-01-01 00:15:01
And so on...
CodePudding user response:
I think easiest way is with tsbox and base R data.frames.
library(tsbox)
data_ts = ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)
ts_df = ts_df(data_ts)
ts_df = dcast(ts_df, time~id, value.var=value)
Output:
> ts_dt
time Series 1 Series 2
1: 2000-01-01 1 8
2: 2000-03-01 2 9
3: 2000-05-01 3 10
4: 2000-07-01 4 11
5: 2000-09-01 5 12
6: 2000-11-01 6 13
7: 2001-01-01 7 14
If you would use just base R I don't know a very handy solution, but there are better R coders in this community than me and maybe one knows a solution without dozens of rows.
Update:
An attempt just with base R. If having months:
data_ts = ts(matrix(1:14, ncol= 2), start= c(2000, 1), frequency= 6)
df_ts = data.frame(data_ts)
df_ts$date = as.numeric(time(data_ts))
df_ts$date = as.Date(paste0(floor(df_ts$date), "-",
sprintf("d", 1 round((df_ts$date-floor(df_ts$date))*12)), "-01"))
Output:
> df_ts
Series.1 Series.2 date
1 1 8 2000-01-01
2 2 9 2000-03-01
3 3 10 2000-05-01
4 4 11 2000-07-01
5 5 12 2000-09-01
6 6 13 2000-11-01
7 7 14 2001-01-01
If having weeks:
df_ts = data.frame(data_ts)
df_ts$week = as.numeric(time(data_ts))
df_ts$week = paste0(floor(df_ts$week), "-",
sprintf("d", 1 round((df_ts$week-floor(df_ts$week))*52)))
Output:
Series.1 Series.2 week
1 1 15 2000-01
2 2 16 2000-02
3 3 17 2000-03
4 4 18 2000-04
5 5 19 2000-05
6 6 20 2000-06
7 7 21 2000-07
8 8 22 2000-08
9 9 23 2000-09
10 10 24 2000-10
11 11 25 2000-11
12 12 26 2000-12
13 13 27 2000-13
14 14 28 2000-14
If having days:
data_ts = ts(matrix(1:20, ncol= 2), start= c(2000, 1, 1), frequency= 365)
df_ts = data.frame(data_ts)
df_ts$date = time(data_ts)
df_ts$date = as.Date(paste0(floor(df_ts$date),
"-",
sprintf("d", 1 round((df_ts$date-floor(df_ts$date))*365)),
"-01"), "%Y-%j")
Output:
> df_ts
Series.1 Series.2 date
1 1 11 2000-01-01
2 2 12 2000-01-02
3 3 13 2000-01-03
4 4 14 2000-01-04
5 5 15 2000-01-05
6 6 16 2000-01-06
7 7 17 2000-01-07
8 8 18 2000-01-08
9 9 19 2000-01-09
10 10 20 2000-01-10