Home > database >  How to split a dataframe in elements of the same size with R, overlapped by another size, keeping ev
How to split a dataframe in elements of the same size with R, overlapped by another size, keeping ev

Time:12-06

I want to apply a method called CCF Sliding-Windows, in which I need to split my time series in windows of 90 days, overlapped by 45 days, and the last part needs to end at the exactly final day of the time series (in a way it can get overlapped with more than 45 days with the previous split).

To exemplify, there's an image to represent the application of this method:

enter image description here

Does anybody know how can I do this with R? I want to create a list that aggregates the time windows of the data frame, so I can purrr::map over to get the cross-correlation.

CodePudding user response:

I'll demonstrates on a vector of integers. Instead of using "90" and "45", I'll use "14" and "7" (arbitrarily) for the sake of brevity.

vec <- 100   1:28
winsize <- 14
minsize <- 7

The last window should start at

laststart <- length(vec) - winsize   1
laststart
# [1] 15

From here, we can split it up as

starts <- 1   (seq_len(ceiling(length(vec) / minsize)) - 1) * minsize
starts <- c(starts[starts < laststart], laststart)
Map(function(a, b) vec[a:b], starts, starts - 1   winsize)
# [[1]]
#  [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114
# [[2]]
#  [1] 108 109 110 111 112 113 114 115 116 117 118 119 120 121
# [[3]]
#  [1] 115 116 117 118 119 120 121 122 123 124 125 126 127 128

Each of those is length 14, and the last one ends on the last element of vec.

If the data is imperfectly aligned, this still works.

vec <- 100   1:40
winsize <- 14
minsize <- ceiling(winsize / 2)
laststart <- length(vec) - winsize   1
starts <- 1   (seq_len(ceiling(length(vec) / minsize)) - 1) * minsize
# the last window is at most minsize, we need it to be between minsize and winsize
starts <- c(starts[starts < laststart], laststart)
Map(function(a, b) vec[a:b], starts, starts - 1   winsize)
# [[1]]
#  [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114
# [[2]]
#  [1] 108 109 110 111 112 113 114 115 116 117 118 119 120 121
# [[3]]
#  [1] 115 116 117 118 119 120 121 122 123 124 125 126 127 128
# [[4]]
#  [1] 122 123 124 125 126 127 128 129 130 131 132 133 134 135
# [[5]]
#  [1] 127 128 129 130 131 132 133 134 135 136 137 138 139 140

CodePudding user response:

You can try rollapply from zoo.

library(zoo)

# your data "ts"
# window size 90
# function 'c' gives you the data
# your overlap 45
# partial window at the end either "TRUE" or
#  the minimum window size allowed, e.g. half window size
# align starts with first full window on left side
rollapply( ts, 90, c, by=45, partial=45, align="left" )

# e.g.
rollapply( 1:20, 5, c, by=3, partial=3, align="left" )
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    4    5    6    7    8
[3,]    7    8    9   10   11
[4,]   10   11   12   13   14
[5,]   13   14   15   16   17
[6,]   16   17   18   19   20
  • Related