Why does subsetting a dataframe changes class of time series?-CodePudding

I have a dataframe like that:

some_ts <- ts(1:12, frequency= 12, start= c(2020, 1))
as_df <- data.frame(time= time(some_ts), val= as.matrix(some_ts))
str(as_df)
'data.frame':   12 obs. of  2 variables:
 $ time: Time-Series  from 2020 to 2021: 2020 2020 2020 2020 2020 ...
 $ val : int  1 2 3 4 5 6 7 8 9 10 ...

Here everything is okay. But as soon as I subset the data the class of the time series changes:

as_df <- as_df[as_df$val != 4, ]
str(as_df)
'data.frame':   11 obs. of  2 variables:
 $ time: num  2020 2020 2020 2020 2020 ...
 $ val : int  1 2 3 5 6 7 8 9 10 11 ...

How can I subset data without changes in class here?

CodePudding user response：

It is possible to over-ride the default behaviour of the ts subsetting function, which is the function stats:::'[.ts'. If you define the following function instead:

`[.ts` <- function (x, i, j, drop = TRUE) 
{
  y <- as.numeric(x)
  if (missing(i)) return(x)
  if (missing(j)) y <- y[i] else y <- y[i, j]
  
  ts(y, start = start(x), frequency = frequency(x))

}

Then subsetting a time series will result in a time series:

df <- data.frame(a= 1:3, b= ts(1:3))

df$b[df$a != 2]
#> Time Series:
#> Start = 1 
#> End = 2 
#> Frequency = 1 
#> [1] 1 3

But maybe now you will see why subsetting a time series may not be a good idea, and why the decision was made by the authors to drop the class when subsetting. The time series has preserved its frequency but shortened its length, and therefore its apparent duration, which is probably not what you wanted. What is the implied date of the value 3?

There would be too much ambiguity in what the user intended to happen here. Did you want the frequency to drop? For NA values to be inserted in the missing months? What if the subset was irregular or random? How would the time series look?

I think if I were designing this class I would probably also play it safe and drop the class type. Perhaps you can see a better design?