what is the best code to write this conditional rowwise function in a tibble?-CodePudding

I want to calculate the geometrical mean of two vectors in a tibble using the tidyverse. The calculated mean should be done rowwise for the two variables. I wrote the function below to this end, and it worked, but I am just wondering how could this operation be done or written in a more efficient way of coding, with more efficient I mean less code, faster and neater. Any better ideas? Just thinking loud, can map_*() be implemented in this case? I am also aware of using rowwise() but as far as I know lately the author of the tidyverse Hadley Wickham downplayed the use of rowwise() strategically.

A minimal representative example is below:

Reprex

df <- tribble(
    ~v1, ~ v2,
    4, 5,
    NA, 7,
    2, 2,
    3, NA,
    NA, NA,
    9, 9)

Suggested function

gMean <- function (df, v1, v2){
    output <- vector ("double", nrow (df))
    for (i in 1:nrow(df)){
        output[[i]] <- case_when (!is.na(df$v1[i]) && !is.na(df$v2[i]) ~ ((df$v1[i] * df$v2[i]) ^ 0.5), 
                                  is.na (df$v1[i]) && is.na (df$v2[i]) ~ 1, 
                                  !is.na(df$v1[i]) && is.na(df$v2[i]) ~ df$v1[i], 
                                  is.na(df$v1[i]) && !is.na(df$v2[i]) ~ df$v2[i]
                                  )
    }
    output
}

output

df %>%
    gMean (v1, v2)

[1] 4.472136 7.000000 2.000000 3.000000 1.000000 9.000000

CodePudding user response：

You could also (just) use mutate instead of looping over each row.

In your case, there is no need to map or to use rowwise, and as case_when is evaluating from the bottom and up, you can simplify your is.na calls as well.

df |> mutate(gMean = case_when(is.na(v1) & is.na(v2) ~ 1,
                               is.na(v1) ~ v2,
                               is.na(v2) ~ v1,
                               TRUE ~ sqrt(v1 * v2)))

However, if we want to use rowwise() or map2_dbl() we could use prod to allow for an na.rm-option, and only take the square root (^(1/2)), where 2 values are available. Utilizing that 1/0 is defined as Inf in R, and 1^Inf as 1.

df |>
  rowwise() |>
  mutate(gMean = prod(v1, v2, na.rm = TRUE) ^ (1 / sum(c(!is.na(v1), !is.na(v2))))) |>
  ungroup()

df |>
  mutate(gMean = map2_dbl(v1, v2, ~ prod(.x, .y, na.rm = TRUE) ^ (1 / sum(c(!is.na(.x), !is.na(.y))))))

Output:

# A tibble: 6 × 3
     v1    v2 gMean
  <dbl> <dbl> <dbl>
1     4     5  4.47
2    NA     7  7   
3     2     2  2   
4     3    NA  3   
5    NA    NA  1   
6     9     9  9

CodePudding user response：

Another possible solution:

library(tidyverse)

df %>% 
  mutate(gMean = map2_dbl(v1, v2, ~ sqrt(.x * .y)) %>% 
           coalesce(v1, v2) %>% if_else(is.na(.), 1, .))

#> # A tibble: 6 × 3
#>      v1    v2 gMean
#>   <dbl> <dbl> <dbl>
#> 1     4     5  4.47
#> 2    NA     7  7   
#> 3     2     2  2   
#> 4     3    NA  3   
#> 5    NA    NA  1   
#> 6     9     9  9