I need to create new columns based on existing ones using a function in tidyverse. I planned to use across()
because it allows dynamic renaming of new variables which is increasingly important in my case and spares a lot of time especially if you have many variables in your data to modify. The function below could not be applied column-wise as expected, it behaves strangely and by changing the values of the P
argument I got unexpected output each time, particularly when I set some values to 1, as if the function was applied element-wise but not column-wise.
I wonder how could this code be written in a more efficient way to achieve the above goal, with efficient I mean, shorter, faster, and neater.
Reprex
set.seed (123)
df <- tibble(id = 1:10,
rosa = runif(10, min = 20.8, max = 36.5),
lila = runif(10, min = 17, max = 37),
blaue = runif(10, min = 23.3, max = 32.7))
df[c (2, 5, 8), c (2:4)] <- NA
Code
myfun <- function(x, P = 2, na.rm = FALSE){
P ^ (min (x, na.rm = na.rm) - x)
}
P <- c(2, 1.5, 1.1) # fiddle with numbers here and see the output each time changes
names <- c ("rosa", "lila", "blaue")
df %>%
select(!!names) %>%
mutate(across(.cols = !!names,
.fns = ~myfun(.x, P, na.rm = TRUE),
.names = "{.col}_P"))
Output
# A tibble: 10 × 6
rosa lila blaue rosa_P lila_P blaue_P
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 25.3 36.1 31.7 0.0718 0.0000526 0.00793
2 NA NA NA NA NA NA
3 27.2 30.6 29.3 0.581 0.439 0.643
4 34.7 28.5 32.6 0.000110 0.0108 0.00401
5 NA NA NA NA NA NA
6 21.5 35.0 30.0 1 0.288 0.605
7 29.1 21.9 28.4 0.00524 1 0.0753
8 NA NA NA NA NA NA
9 29.5 23.6 26.0 0.469 0.856 0.881
10 28.0 36.1 24.7 0.0114 0.0000543 1
Warning messages:
1: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length
2: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length
3: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length
Expected Output
df %>%
select(!!names) %>%
mutate(rosa_P = 2^(min (rosa, na.rm = TRUE) - rosa)) %>%
mutate(lila_P = 1.5^(min (lila, na.rm = TRUE) - lila)) %>%
mutate(blaue_P = 1.1^(min (blaue, na.rm = TRUE) - blaue))
# A tibble: 10 × 6
rosa lila blaue rosa_P lila_P blaue_P
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 25.3 36.1 31.7 0.0718 0.00314 0.514
2 NA NA NA NA NA NA
3 27.2 30.6 29.3 0.0192 0.0302 0.643
4 34.7 28.5 32.6 0.000110 0.0708 0.468
5 NA NA NA NA NA NA
6 21.5 35.0 30.0 1 0.00498 0.605
7 29.1 21.9 28.4 0.00524 1 0.701
8 NA NA NA NA NA NA
9 29.5 23.6 26.0 0.00407 0.515 0.881
10 28.0 36.1 24.7 0.0114 0.00320 1
CodePudding user response:
Your problem is the P
vector as across
won't understand which number belongs to what call but will pass all three numbers to myfun
. Instead, you could name it, and use cur_column()
.
It might also be safer to use all_of
/any_of
instead of !!
, and to use another name for the vector of names
than names
as it's also a base function. This might lead to confusion.
library(dplyr)
P <- c(rosa = 2, lila = 1.5, blaue = 1.1)
colournames <- names(P) #c("rosa", "lila", "blaue")
df |>
#select(all_of(colournames)) |>
mutate(across(all_of(colournames),
~ P[cur_column()] ^ (min(., na.rm = TRUE) - .), # ~ my_fun(., P[cur_column()], na.rm = TRUE)
.names = "{.col}_P"))
Output:
# A tibble: 10 × 6
rosa lila blaue rosa_P lila_P blaue_P
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 25.3 36.1 31.7 0.0718 0.00314 0.514
2 NA NA NA NA NA NA
3 27.2 30.6 29.3 0.0192 0.0302 0.643
4 34.7 28.5 32.6 0.000110 0.0708 0.468
5 NA NA NA NA NA NA
6 21.5 35.0 30.0 1 0.00498 0.605
7 29.1 21.9 28.4 0.00524 1 0.701
8 NA NA NA NA NA NA
9 29.5 23.6 26.0 0.00407 0.515 0.881
10 28.0 36.1 24.7 0.0114 0.00320 1
Update, changed as OP example changed.