Home > Net >  R - preprocessing of time series data
R - preprocessing of time series data

Time:08-11

I have the following data structure, with Stocks S, having features f:

year S1_f1  S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011   0.1    0.4  0.12  0.42   0.2   0.5     n     n
2012   0.4    0.7  0.42  0.72   0.5   0.8     n     n
2013   0.7    0.9  0.72   0.5   0.8   0.9     n     n
n        n      n     n     n     n     n     n     n

My original df has 10 observations but 50k predictors - so I want to generate more balance on the observation side.

Hence, I want to have the following dataframe:

year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2 Sn_f1 Sn_f2
2011   0.1   0.4     0     0     0     0     0     0
2012   0.4   0.7     0     0     0     0     0     0
2013   0.7   0.9     0     0     0     0     0     0
2011     0     0  0.12  0.42     0     0     0     0
2012     0     0  0.42  0.72     0     0     0     0
2013     0     0  0.72   0.5     0     0     0     0
2011     0     0     0     0   0.2   0.5     0     0
2012     0     0     0     0   0.5   0.8     0     0
2013     0     0     0     0   0.8   0.9     0     0
n        0     0     0     0     0     0     n     n

...and so on (example values).

I want to artificially multiply my timestamps via this approach.

Is there an elegant way to do this?

CodePudding user response:

You can convert what you have into what you want using the following code:

library(data.table)
dcast(
  melt(setDT(s), id="year")[, grp:=gsub("_.*$","",variable)],
  year grp~variable,
  value.var="value"
  )[order(grp,year)]

Output:

    year    grp S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
   <int> <char> <num> <num> <num> <num> <num> <num>
1:  2011     S1   0.1   0.4    NA    NA    NA    NA
2:  2012     S1   0.4   0.7    NA    NA    NA    NA
3:  2013     S1   0.7   0.9    NA    NA    NA    NA
4:  2011     S2    NA    NA  0.12  0.42    NA    NA
5:  2012     S2    NA    NA  0.42  0.72    NA    NA
6:  2013     S2    NA    NA  0.72  0.50    NA    NA
7:  2011     S3    NA    NA    NA    NA   0.2   0.5
8:  2012     S3    NA    NA    NA    NA   0.5   0.8
9:  2013     S3    NA    NA    NA    NA   0.8   0.9

Input:

structure(list(year = 2011:2013, S1_f1 = c(0.1, 0.4, 0.7), S1_f2 = c(0.4, 
0.7, 0.9), S2_f1 = c(0.12, 0.42, 0.72), S2_f2 = c(0.42, 0.72, 
0.5), S3_f1 = c(0.2, 0.5, 0.8), S3_f2 = c(0.5, 0.8, 0.9)), row.names = c(NA, 
-3L), class = "data.frame")

CodePudding user response:

One possible way o solve your problem (note that I did not convert the data, say df, into a data.table):

library(data.table)

result = sub("^S(\\d) _.*", "\\1", names(df)[-1]) |> 
  unique() |> 
  lapply(function(i) df[sprintf(c("year", "S%s_f1", "S%s_f2"), i)]) |> 
  rbindlist(use.names=TRUE, fill=TRUE) |> 
  setnafill(fill=0)

    year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
   <int> <num> <num> <num> <num> <num> <num>
1:  2011   0.1   0.4  0.00  0.00   0.0   0.0
2:  2012   0.4   0.7  0.00  0.00   0.0   0.0
3:  2013   0.7   0.9  0.00  0.00   0.0   0.0
4:  2011   0.0   0.0  0.12  0.42   0.0   0.0
5:  2012   0.0   0.0  0.42  0.72   0.0   0.0
6:  2013   0.0   0.0  0.72  0.50   0.0   0.0
7:  2011   0.0   0.0  0.00  0.00   0.2   0.5
8:  2012   0.0   0.0  0.00  0.00   0.5   0.8
9:  2013   0.0   0.0  0.00  0.00   0.8   0.9

CodePudding user response:

Using the sample data frame DF defined reproducibly in the Note at the end, create a vector g defining a grouping of the columns which is in the case of the example equals c("S1", "S1", "S2", "S2", "S3", "S3") . Then use it to split the columns into a list of matrices L, one matrix for each level of g. Apply .bdiag from the Matrix package to that list to create a block diagonal matrix and insert the year column and set the column names. Note that the Matrix package comes with R and does not have to be installed so this only uses base R.

library(Matrix)

g <- sub("_.*", "", names(DF)[-1])
L <- tapply(as.list(DF[-1]), g, function(x) as.matrix(as.data.frame(x)))
setNames(data.frame(DF$year, as.matrix(.bdiag(L))), names(DF))

giving:

  year S1_f1 S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
1 2011   0.1   0.4  0.00  0.00   0.0   0.0
2 2012   0.4   0.7  0.00  0.00   0.0   0.0
3 2013   0.7   0.9  0.00  0.00   0.0   0.0
4 2011   0.0   0.0  0.12  0.42   0.0   0.0
5 2012   0.0   0.0  0.42  0.72   0.0   0.0
6 2013   0.0   0.0  0.72  0.50   0.0   0.0
7 2011   0.0   0.0  0.00  0.00   0.2   0.5
8 2012   0.0   0.0  0.00  0.00   0.5   0.8
9 2013   0.0   0.0  0.00  0.00   0.8   0.9

Note

Lines <- "
year S1_f1  S1_f2 S2_f1 S2_f2 S3_f1 S3_f2
2011   0.1    0.4  0.12  0.42   0.2   0.5
2012   0.4    0.7  0.42  0.72   0.5   0.8
2013   0.7    0.9  0.72   0.5   0.8   0.9"
DF <- read.table(text = Lines, header = TRUE)
  •  Tags:  
  • r
  • Related