Home > other >  Interpolating or spline all columns of a data frame
Interpolating or spline all columns of a data frame

Time:01-26

If a data frame has M rows, how can it be interpolated or splined to create a new data frame with N rows? Here is an example:

# Start with some vectors of constant length (M=7) with data at each time point t
df <- tibble(t = c(1, 2, 3, 4, 5, 6, 7),
             y1 = c(0.0, 0.5, 1.0, 3.0, 5.0, 2.0, 0.0),
             y2 = c(0.0, 0.75, 1.5, 3.5, 6.0, 4.0, 0.0),
             y3 = c(0.0, 1.0, 2.0, 4.0, 3.0, 2.0, 0.0))

# How to interpolate or spline these to other numbers of points (rows)?
# By individual column, to spline results to a new vector with length N=15:
spline(x=df$t, y=df$y1, n=15)
spline(x=df$t, y=df$y2, n=15)
spline(x=df$t, y=df$y3, n=15)

So by vector this is trivial. Question is, how can this spline be applied to all columns across the dataset with M rows to create a new dataset with N rows, preferably with tidyverse approach, e.g.:

df15 <- df %>% mutate(...replace(?)...(spline(x=?, y=?, n=15)... ???))

Again, I would like to have this spline be applied across ALL columns without having to specify syntax that includes column names. The intent is to apply this to data frames with something on the order of 100 columns and where names and numbers of columns may vary. It is of course not necessary to include the t (or x) column in the data frame if that simplifies the approach at all. Thanks for any insight.

CodePudding user response:

spline returns a list. So, we may loop across with summarise and then unpack the columns (summarise is flexible in returning any number of rows whereas mutate is fixed i.e. it should return the same number of rows as the input)

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   summarise(across(y1:y3,  ~spline(t, .x, n = 15) %>%
    as_tibble %>% 
    rename_with(~ str_c(cur_column(), .)))) %>% 
   unpack(everything())

-output

# A tibble: 15 × 6
     y1x   y1y   y2x   y2y   y3x   y3y
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1    0      1    0      1    0    
 2  1.43 0.319  1.43 0.404  1.43 0.542
 3  1.86 0.468  1.86 0.673  1.86 0.905
 4  2.29 0.566  2.29 0.907  2.29 1.18 
 5  2.71 0.752  2.71 1.21   2.71 1.56 
 6  3.14 1.18   3.14 1.68   3.14 2.30 
 7  3.57 1.93   3.57 2.43   3.57 3.33 
 8  4    3      4    3.5    4    4    
 9  4.43 4.24   4.43 4.84   4.43 3.83 
10  4.86 4.99   4.86 5.85   4.86 3.21 
11  5.29 4.56   5.29 5.90   5.29 2.67 
12  5.71 3.12   5.71 4.96   5.71 2.29 
13  6.14 1.47   6.14 3.46   6.14 1.82 
14  6.57 0.269  6.57 1.74   6.57 1.09 
15  7    0      7    0      7    0    

NOTE: Here, we renamed the columns as the output from spline is a list with names x and y and data.frame/tibble wants unique column names

CodePudding user response:

Here is an option with data.table

library(data.table)

setDT(df)[,
  lapply(.SD, function(v) list2DF(spline(t, v, n = 15))),
  .SDcols = patterns("^y\\d ")
]

which gives

        y1.x      y1.y     y2.x      y2.y     y3.x      y3.y
 1: 1.000000 0.0000000 1.000000 0.0000000 1.000000 0.0000000
 2: 1.428571 0.3194303 1.428571 0.4039226 1.428571 0.5423159
 3: 1.857143 0.4680242 1.857143 0.6731712 1.857143 0.9052687
 4: 2.285714 0.5655593 2.285714 0.9065841 2.285714 1.1770242
 5: 2.714286 0.7515972 2.714286 1.2081346 2.714286 1.5555866
 6: 3.142857 1.1773997 3.142857 1.6848330 3.142857 2.3039184
 7: 3.571429 1.9306220 3.571429 2.4271800 3.571429 3.3318454
 8: 4.000000 3.0000000 4.000000 3.5000000 4.000000 4.0000000
 9: 4.428571 4.2387392 4.428571 4.8368010 4.428571 3.8340703
10: 4.857143 4.9919616 4.857143 5.8546581 4.857143 3.2089361
11: 5.285714 4.5551878 5.285714 5.8976389 5.285714 2.6706702
12: 5.714286 3.1239451 5.714286 4.9619776 5.714286 2.2875045
13: 6.142857 1.4724741 6.142857 3.4632587 6.142857 1.8204137
14: 6.571429 0.2685633 6.571429 1.7399284 6.571429 1.0868916
15: 7.000000 0.0000000 7.000000 0.0000000 7.000000 0.0000000
  •  Tags:  
  • Related