I have a long list of time diff calculations I want to make. Given the names of the columns are long and contain spaces (pulled from an API) I've created a table of the variable names to be used for each operation:
stage_refs <- structure(list(new.diff.var = c("time diff 1", "time diff 2",
"time diff 3", "time diff 4"), var.1 = c("time value 2", "time value 3",
"time value 4", "time value 5"), var.2 = c("time value 1", "time value 2",
"time value 3", "time value 4")), row.names = c(NA, -4L), spec = structure(list(
cols = list(new.diff.var = structure(list(), class = c("collector_character",
"collector")), var.1 = structure(list(), class = c("collector_character",
"collector")), var.2 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
And a date frame of dates with missing values for where no dates are as yet recorded:
date_values <- structure(list(`time value 1` = structure(c(18993, 18993, 18993,
NA), class = "Date"), `time value 2` = structure(c(19024, NA,
19026, 19027), class = "Date"), `time value 3` = structure(c(NA,
19084, 19085, 19086), class = "Date"), `time value 4` = structure(c(19113,
19114, NA, 19116), class = "Date"), `time value 5` = structure(c(19174,
19175, 19176, 19177), class = "Date")), row.names = c(NA, -4L
), class = c("tbl_df", "tbl", "data.frame"))
In the production data set there are up to 20 diff time calculations to make, hence why I've put in a table so I can use a function such as:
library(tidyverse)
difftime_fun <- function(x, y, z) {
date_values |>
mutate(!!x = difftime(y, z))
}
... to create a new column from x by the calculation between y & z and then apply this function row-wise over stage_refs
using a loop:
for(i in 1:nrow(time_diff_stages)) {
difftime_fun(
stage_refs$new.diff.var[[i]],
stage_refs$var.1[[i]],
stage_refs$var.2[[i]]
)
What I want is as many timediff columns added to date_values
as rows in stage_refs
using the variable names from stage_refs
for assignment of the cols and for the calculations.
There are 2 places I'm stuck:
- I'm running into non-standard-evaluation problems with using the variable names in the function. I've tried combinations of
!!
,{{ }}
,eval
etc and can't make sense of it. - I suspect there's a better way than using the loop to go row-wise using
apply
or some such, but not solving (1) means it's hard to trial and error usingapply
.
I have looked at this solution, but can't make sense as to how to use for this problem.
Thanks.
For clarification the final df would have the following columns:
[1] "time value 1" "time value 2" "time value 3" "time value 4" "time value 5" "time diff 1"
[7] "time diff 2" "time diff 3" "time diff 4"
CodePudding user response:
To achieve your desired result you could first rewrite your function by adding a data argument and using the .data
pronoun which allows to access dat columns passed as character string. After doing so you could e.g. use purrr::reduce
or base Reduce
to loop over the rows of your
stage_refs
dataframe and apply your difftime_fun
to create your diff time columns.
Note: In row 4 of your stage_refs
a column "time value 7"
was specified which was not part of your example data. I simply replaced it by "time value 4"
.
library(tidyverse)
difftime_fun <- function(.data, x, y, z) {
.data |>
mutate("{x}" := difftime(.data[[y]], .data[[z]]))
}
purrr::reduce(seq(nrow(stage_refs)), function(.x, i) {
difftime_fun(.x,
stage_refs$new.diff.var[[i]],
stage_refs$var.1[[i]],
stage_refs$var.2[[i]]
)
}, .init = date_values)
#> # A tibble: 4 × 9
#> time val…¹ time val…² time val…³ time val…⁴ time val…⁵ time …⁶ time …⁷ time …⁸
#> <date> <date> <date> <date> <date> <drtn> <drtn> <drtn>
#> 1 2022-01-01 2022-02-01 NA 2022-05-01 2022-07-01 31 days NA days NA days
#> 2 2022-01-01 NA 2022-04-02 2022-05-02 2022-07-02 NA days NA days 30 days
#> 3 2022-01-01 2022-02-03 2022-04-03 NA 2022-07-03 33 days 59 days NA days
#> 4 NA 2022-02-04 2022-04-04 2022-05-04 2022-07-04 NA days 59 days 30 days
#> # … with 1 more variable: `time diff 4` <drtn>, and abbreviated variable names
#> # ¹`time value 1`, ²`time value 2`, ³`time value 3`, ⁴`time value 4`,
#> # ⁵`time value 5`, ⁶`time diff 1`, ⁷`time diff 2`, ⁸`time diff 3`
CodePudding user response:
Another alternative is to build function calls from your reference data frame using pmap()
and to splice those inside of mutate()
. This will be more efficient than iteratively calling mutate()
.
library(purrr)
library(dplyr)
date_values %>%
mutate(!!!setNames(pmap(stage_refs, ~ call("difftime", sym(..2), sym(..3))), stage_refs$new.diff.var))
# A tibble: 4 × 9
`time value 1` `time value 2` `time value 3` `time value 4` `time value 5` `time diff 1` `time diff 2` `time diff 3` `time diff 4`
<date> <date> <date> <date> <date> <drtn> <drtn> <drtn> <drtn>
1 2022-01-01 2022-02-01 NA 2022-05-01 2022-07-01 31 days NA days NA days 61 days
2 2022-01-01 NA 2022-04-02 2022-05-02 2022-07-02 NA days NA days 30 days 61 days
3 2022-01-01 2022-02-03 2022-04-03 NA 2022-07-03 33 days 59 days NA days NA days
4 NA 2022-02-04 2022-04-04 2022-05-04 2022-07-04 NA days 59 days 30 days 61 days