If a data frame has M rows, how can it be interpolated or splined to create a new data frame with N rows? Here is an example:
# Start with some vectors of constant length (M=7) with data at each time point t
df <- tibble(t = c(1, 2, 3, 4, 5, 6, 7),
y1 = c(0.0, 0.5, 1.0, 3.0, 5.0, 2.0, 0.0),
y2 = c(0.0, 0.75, 1.5, 3.5, 6.0, 4.0, 0.0),
y3 = c(0.0, 1.0, 2.0, 4.0, 3.0, 2.0, 0.0))
# How to interpolate or spline these to other numbers of points (rows)?
# By individual column, to spline results to a new vector with length N=15:
spline(x=df$t, y=df$y1, n=15)
spline(x=df$t, y=df$y2, n=15)
spline(x=df$t, y=df$y3, n=15)
So by vector this is trivial. Question is, how can this spline be applied to all columns across the dataset with M rows to create a new dataset with N rows, preferably with tidyverse approach, e.g.:
df15 <- df %>% mutate(...replace(?)...(spline(x=?, y=?, n=15)... ???))
Again, I would like to have this spline be applied across ALL columns without having to specify syntax that includes column names. The intent is to apply this to data frames with something on the order of 100 columns and where names and numbers of columns may vary. It is of course not necessary to include the t (or x) column in the data frame if that simplifies the approach at all. Thanks for any insight.
CodePudding user response:
spline
returns a list
. So, we may loop across
with summarise
and then unpack
the columns (summarise
is flexible in returning any number of rows whereas mutate
is fixed i.e. it should return the same number of rows as the input)
library(dplyr)
library(tidyr)
library(stringr)
df %>%
summarise(across(y1:y3, ~spline(t, .x, n = 15) %>%
as_tibble %>%
rename_with(~ str_c(cur_column(), .)))) %>%
unpack(everything())
-output
# A tibble: 15 × 6
y1x y1y y2x y2y y3x y3y
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 0 1 0
2 1.43 0.319 1.43 0.404 1.43 0.542
3 1.86 0.468 1.86 0.673 1.86 0.905
4 2.29 0.566 2.29 0.907 2.29 1.18
5 2.71 0.752 2.71 1.21 2.71 1.56
6 3.14 1.18 3.14 1.68 3.14 2.30
7 3.57 1.93 3.57 2.43 3.57 3.33
8 4 3 4 3.5 4 4
9 4.43 4.24 4.43 4.84 4.43 3.83
10 4.86 4.99 4.86 5.85 4.86 3.21
11 5.29 4.56 5.29 5.90 5.29 2.67
12 5.71 3.12 5.71 4.96 5.71 2.29
13 6.14 1.47 6.14 3.46 6.14 1.82
14 6.57 0.269 6.57 1.74 6.57 1.09
15 7 0 7 0 7 0
NOTE: Here, we renamed the columns as the output from spline
is a list
with names x
and y
and data.frame/tibble
wants unique column names
CodePudding user response:
Here is an option with data.table
library(data.table)
setDT(df)[,
lapply(.SD, function(v) list2DF(spline(t, v, n = 15))),
.SDcols = patterns("^y\\d ")
]
which gives
y1.x y1.y y2.x y2.y y3.x y3.y
1: 1.000000 0.0000000 1.000000 0.0000000 1.000000 0.0000000
2: 1.428571 0.3194303 1.428571 0.4039226 1.428571 0.5423159
3: 1.857143 0.4680242 1.857143 0.6731712 1.857143 0.9052687
4: 2.285714 0.5655593 2.285714 0.9065841 2.285714 1.1770242
5: 2.714286 0.7515972 2.714286 1.2081346 2.714286 1.5555866
6: 3.142857 1.1773997 3.142857 1.6848330 3.142857 2.3039184
7: 3.571429 1.9306220 3.571429 2.4271800 3.571429 3.3318454
8: 4.000000 3.0000000 4.000000 3.5000000 4.000000 4.0000000
9: 4.428571 4.2387392 4.428571 4.8368010 4.428571 3.8340703
10: 4.857143 4.9919616 4.857143 5.8546581 4.857143 3.2089361
11: 5.285714 4.5551878 5.285714 5.8976389 5.285714 2.6706702
12: 5.714286 3.1239451 5.714286 4.9619776 5.714286 2.2875045
13: 6.142857 1.4724741 6.142857 3.4632587 6.142857 1.8204137
14: 6.571429 0.2685633 6.571429 1.7399284 6.571429 1.0868916
15: 7.000000 0.0000000 7.000000 0.0000000 7.000000 0.0000000