Home > Blockchain >  what is the best code to write this conditional rowwise function in a tibble?
what is the best code to write this conditional rowwise function in a tibble?

Time:07-26

I want to calculate the geometrical mean of two vectors in a tibble using the tidyverse. The calculated mean should be done rowwise for the two variables. I wrote the function below to this end, and it worked, but I am just wondering how could this operation be done or written in a more efficient way of coding, with more efficient I mean less code, faster and neater. Any better ideas? Just thinking loud, can map_*() be implemented in this case? I am also aware of using rowwise() but as far as I know lately the author of the tidyverse Hadley Wickham downplayed the use of rowwise() strategically.

A minimal representative example is below:

Reprex

df <- tribble(
    ~v1, ~ v2,
    4, 5,
    NA, 7,
    2, 2,
    3, NA,
    NA, NA,
    9, 9)

Suggested function

gMean <- function (df, v1, v2){
    output <- vector ("double", nrow (df))
    for (i in 1:nrow(df)){
        output[[i]] <- case_when (!is.na(df$v1[i]) && !is.na(df$v2[i]) ~ ((df$v1[i] * df$v2[i]) ^ 0.5), 
                                  is.na (df$v1[i]) && is.na (df$v2[i]) ~ 1, 
                                  !is.na(df$v1[i]) && is.na(df$v2[i]) ~ df$v1[i], 
                                  is.na(df$v1[i]) && !is.na(df$v2[i]) ~ df$v2[i]
                                  )
    }
    output
}

output

df %>%
    gMean (v1, v2)

[1] 4.472136 7.000000 2.000000 3.000000 1.000000 9.000000

CodePudding user response:

You could also (just) use mutate instead of looping over each row.

In your case, there is no need to map or to use rowwise, and as case_when is evaluating from the bottom and up, you can simplify your is.na calls as well.

df |> mutate(gMean = case_when(is.na(v1) & is.na(v2) ~ 1,
                               is.na(v1) ~ v2,
                               is.na(v2) ~ v1,
                               TRUE ~ sqrt(v1 * v2)))

However, if we want to use rowwise() or map2_dbl() we could use prod to allow for an na.rm-option, and only take the square root (^(1/2)), where 2 values are available. Utilizing that 1/0 is defined as Inf in R, and 1^Inf as 1.

df |>
  rowwise() |>
  mutate(gMean = prod(v1, v2, na.rm = TRUE) ^ (1 / sum(c(!is.na(v1), !is.na(v2))))) |>
  ungroup()
df |>
  mutate(gMean = map2_dbl(v1, v2, ~ prod(.x, .y, na.rm = TRUE) ^ (1 / sum(c(!is.na(.x), !is.na(.y))))))

Output:

# A tibble: 6 × 3
     v1    v2 gMean
  <dbl> <dbl> <dbl>
1     4     5  4.47
2    NA     7  7   
3     2     2  2   
4     3    NA  3   
5    NA    NA  1   
6     9     9  9  

CodePudding user response:

Another possible solution:

library(tidyverse)

df %>% 
  mutate(gMean = map2_dbl(v1, v2, ~ sqrt(.x * .y)) %>% 
           coalesce(v1, v2) %>% if_else(is.na(.), 1, .))

#> # A tibble: 6 × 3
#>      v1    v2 gMean
#>   <dbl> <dbl> <dbl>
#> 1     4     5  4.47
#> 2    NA     7  7   
#> 3     2     2  2   
#> 4     3    NA  3   
#> 5    NA    NA  1   
#> 6     9     9  9
  • Related