Home > Mobile >  Case_when statements in pipe operator on a vector
Case_when statements in pipe operator on a vector

Time:12-26

I am trying to understand how to better use if else or case_when in a pipeline when manipulating a vector. After scraping an element of a website I am left with this vector:

[1] "66"        "121"       "112 - 150" "211"       "197"       "25"        "72"       
[8] "59"        "100"       "69 - 194" 

c("66", "121", "112 - 150", "211", "197", "25", "72", "59", "100", 
"69 - 194")

library(tidyverse)
library(stringr) (1.5.0)

I want to manipulate them in a vector before I put them in a dataframe/tibble. Such that if there are two numbers in a string (ex. 112 - 150), replace it with the mean of the two. I have tried the following:

vector %>%
  case_when(
    str_detect(., "-") ~ . %>%
      str_split_1(" - ") %>%
      as.numeric() %>%
      mean(),
    T ~ .
  ) 

Which does not work. Individually, it works:

"112 - 150" %>% 
  str_split_1(" - ") %>% 
  as.numeric() %>% 
  mean()

[1] 131

Then I thought perhaps case_when() does not work with a vector. But it clearly does:

case_when(
  vector == "66" ~ "SIXSIX", 
  TRUE ~ "NOT 66"
)

 [1] "SIXSIX" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66"
 [9] "NOT 66" "NOT 66"

I would prefer a suggestion without the conventional if statement as such:

vector %>% 
  {if (cond) ** else **}

CodePudding user response:

When written with a pipe, vector %>% case_when(...) evaluates as case_when(vector, ...), but since even the first argument of case_when must be a two-sided formula, it returns an error. Hence the message:

Error in case_when(): ! Case 1 (.) must be a two-sided formula, not a character vector.

In this case, you don't need case_when, since you can apply mean even to single elements:

library(purrr)
library(stringr)
library(dplyr)

vector %>% 
  str_split(' - ') %>% 
  map_dbl(~ mean(as.numeric(.x)))
#[1]  66.0 121.0 131.0 211.0 197.0  25.0  72.0  59.0 100.0 131.5

With case_when, this still works:

case_when( 
  str_detect(vector, "-") ~ vector %>% 
    str_split(' - ') %>% 
    map_dbl(~ mean(as.numeric(.x))),
  T ~ as.numeric(vector)
)
  • Related