I want to calculate the geometrical mean of two vectors in a tibble using the tidyverse. The calculated mean should be done rowwise for the two variables. I wrote the function below to this end, and it worked, but I am just wondering how could this operation be done or written in a more efficient way of coding, with more efficient I mean less code, faster and neater. Any better ideas? Just thinking loud, can map_*()
be implemented in this case? I am also aware of using rowwise()
but as far as I know lately the author of the tidyverse Hadley Wickham downplayed the use of rowwise()
strategically.
A minimal representative example is below:
Reprex
df <- tribble(
~v1, ~ v2,
4, 5,
NA, 7,
2, 2,
3, NA,
NA, NA,
9, 9)
Suggested function
gMean <- function (df, v1, v2){
output <- vector ("double", nrow (df))
for (i in 1:nrow(df)){
output[[i]] <- case_when (!is.na(df$v1[i]) && !is.na(df$v2[i]) ~ ((df$v1[i] * df$v2[i]) ^ 0.5),
is.na (df$v1[i]) && is.na (df$v2[i]) ~ 1,
!is.na(df$v1[i]) && is.na(df$v2[i]) ~ df$v1[i],
is.na(df$v1[i]) && !is.na(df$v2[i]) ~ df$v2[i]
)
}
output
}
output
df %>%
gMean (v1, v2)
[1] 4.472136 7.000000 2.000000 3.000000 1.000000 9.000000
CodePudding user response:
You could also (just) use mutate
instead of looping over each row.
In your case, there is no need to map
or to use rowwise
, and as case_when
is evaluating from the bottom and up, you can simplify your is.na
calls as well.
df |> mutate(gMean = case_when(is.na(v1) & is.na(v2) ~ 1,
is.na(v1) ~ v2,
is.na(v2) ~ v1,
TRUE ~ sqrt(v1 * v2)))
However, if we want to use rowwise()
or map2_dbl()
we could use prod
to allow for an na.rm
-option, and only take the square root (^(1/2)
), where 2 values are available. Utilizing that 1/0
is defined as Inf
in R, and 1^Inf
as 1
.
df |>
rowwise() |>
mutate(gMean = prod(v1, v2, na.rm = TRUE) ^ (1 / sum(c(!is.na(v1), !is.na(v2))))) |>
ungroup()
df |>
mutate(gMean = map2_dbl(v1, v2, ~ prod(.x, .y, na.rm = TRUE) ^ (1 / sum(c(!is.na(.x), !is.na(.y))))))
Output:
# A tibble: 6 × 3
v1 v2 gMean
<dbl> <dbl> <dbl>
1 4 5 4.47
2 NA 7 7
3 2 2 2
4 3 NA 3
5 NA NA 1
6 9 9 9
CodePudding user response:
Another possible solution:
library(tidyverse)
df %>%
mutate(gMean = map2_dbl(v1, v2, ~ sqrt(.x * .y)) %>%
coalesce(v1, v2) %>% if_else(is.na(.), 1, .))
#> # A tibble: 6 × 3
#> v1 v2 gMean
#> <dbl> <dbl> <dbl>
#> 1 4 5 4.47
#> 2 NA 7 7
#> 3 2 2 2
#> 4 3 NA 3
#> 5 NA NA 1
#> 6 9 9 9