Home > Net >  Creating loops across a time series to generate new dataset based on a formula in R
Creating loops across a time series to generate new dataset based on a formula in R

Time:09-29

I haven't been able to figure out the right approach to solve a coding puzzle in R and could use your assistance.

First, here's a small reproducible example of my large dataset:

data <- data.frame(
Date <- sample(c("8/7/2014 23:01", "8/8/2014 10:01", "8/7/2014 11:01", "8/7/2014 12:01", "8/7/2014 13:01")),
`361` <- sample(c("0.035", "0.039", "0.032", "0.042", "0.033")),
`362` <- sample(c("0.038", "0.043", "0.054", "0.023", "0.076")),
`363` <- sample(c("0.038", "0.040", "0.040", "0.020", "0.083")))

I have hundreds of columns of subsequently increasing numbers, and thousands of rows for each hour for multiple years.

What I would like to create is a new time series taking the normalized difference index (NDI) for every single combination of columns. For reference, the NDI formula is: NDI(a:b) = (x-y)/(x y). And for an example, NDI(361:362) = (x-y)/(x y), and using the values of only the first date, 8/7/2014 23:01, it would be NDI(361:362) = (0.035-0.038)/(0.035 0.038) = -0.041.

Now, I would like to create a loop to automate the calculations for NDI(361:363), NDI(362:363), and so on for all the rest of the dates and columns, where the output would be placed in a new data frame.

Any and all help would be greatly appreciated.

Thank you for your time and efforts!

CodePudding user response:

I made some quick changes to the code to create the data frame. This is what I used, hopefully it still captures what you have:

library(tidyverse)

data <- tibble(
  Date = c("8/7/2014 23:01", "8/8/2014 10:01", "8/7/2014 11:01", "8/7/2014 12:01", "8/7/2014 13:01"),
  `361` = c(0.035, 0.039, 0.032, 0.042, 0.033),
  `362` = c(0.038, 0.043, 0.054, 0.023, 0.076),
  `363` = c(0.038, 0.040, 0.040, 0.020, 0.083), 
  `364` = c(0.034, 0.043, 0.029, 0.019, 0.080)
)

The key to making this work was to first identify all the combinations of columns using combn. Then it's just a matter of putting it through a loop, or in this case a map function.

all_combinations <- combn(names(data)[names(data) != 'Date'], 2) %>%
  as_tibble(.name_repair = 'minimal') %>%
  as.list() %>%
  map_dfc(function(combo) {
    new_column_name <- paste('NDI', combo[1], combo[2], sep = '_')
    data %>%
      mutate(!!new_column_name := (!!sym(combo[1]) - !!sym(combo[2])) / (!!sym(combo[1])   !!sym(combo[2]))) %>%
      select(last_col())
  })

bind_cols(
  data, 
  all_combinations
)

# A tibble: 5 x 11
  Date           `361` `362` `363` `364` NDI_361_362 NDI_361_363 NDI_361_364 NDI_362_363 NDI_362_364 NDI_363_364
  <chr>          <dbl> <dbl> <dbl> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
1 8/7/2014 23:01 0.035 0.038 0.038 0.034     -0.0411     -0.0411      0.0145      0           0.0556      0.0556
2 8/8/2014 10:01 0.039 0.043 0.04  0.043     -0.0488     -0.0127     -0.0488      0.0361      0          -0.0361
3 8/7/2014 11:01 0.032 0.054 0.04  0.029     -0.256      -0.111       0.0492      0.149       0.301       0.159 
4 8/7/2014 12:01 0.042 0.023 0.02  0.019      0.292       0.355       0.377       0.0698      0.0952      0.0256
5 8/7/2014 13:01 0.033 0.076 0.083 0.08      -0.394      -0.431      -0.416      -0.0440     -0.0256      0.0184
  • Related