I am trying to understand how to better use if else
or case_when
in a pipeline when manipulating a vector. After scraping an element of a website I am left with this vector:
[1] "66" "121" "112 - 150" "211" "197" "25" "72"
[8] "59" "100" "69 - 194"
c("66", "121", "112 - 150", "211", "197", "25", "72", "59", "100",
"69 - 194")
library(tidyverse)
library(stringr) (1.5.0)
I want to manipulate them in a vector before I put them in a dataframe/tibble. Such that if there are two numbers in a string (ex. 112 - 150), replace it with the mean of the two. I have tried the following:
vector %>%
case_when(
str_detect(., "-") ~ . %>%
str_split_1(" - ") %>%
as.numeric() %>%
mean(),
T ~ .
)
Which does not work. Individually, it works:
"112 - 150" %>%
str_split_1(" - ") %>%
as.numeric() %>%
mean()
[1] 131
Then I thought perhaps case_when()
does not work with a vector. But it clearly does:
case_when(
vector == "66" ~ "SIXSIX",
TRUE ~ "NOT 66"
)
[1] "SIXSIX" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66" "NOT 66"
[9] "NOT 66" "NOT 66"
I would prefer a suggestion without the conventional if statement as such:
vector %>%
{if (cond) ** else **}
CodePudding user response:
When written with a pipe, vector %>% case_when(...)
evaluates as case_when(vector, ...)
, but since even the first argument of case_when
must be a two-sided formula, it returns an error. Hence the message:
Error in
case_when()
: ! Case 1 (.
) must be a two-sided formula, not a character vector.
In this case, you don't need case_when
, since you can apply mean
even to single elements:
library(purrr)
library(stringr)
library(dplyr)
vector %>%
str_split(' - ') %>%
map_dbl(~ mean(as.numeric(.x)))
#[1] 66.0 121.0 131.0 211.0 197.0 25.0 72.0 59.0 100.0 131.5
With case_when
, this still works:
case_when(
str_detect(vector, "-") ~ vector %>%
str_split(' - ') %>%
map_dbl(~ mean(as.numeric(.x))),
T ~ as.numeric(vector)
)