Home > Software engineering >  Why does this case_when command do the command for all cases?
Why does this case_when command do the command for all cases?

Time:03-04

4 in column 1 represents an answer that means "other". Column 2 represents what that "other" is. I want everything that doesn't have answer 4 in column 1 to have NA in column 2. Using case_when does not give the result I expect.

I have this data

col1    col2
1       "a"
4       "c"
4       NA
3       NA

I run:

df <- df %>%
  mutate(col2 = case_when(col1 != 4 ~ NA))

And expect:

col1    col2
1       NA
4       "c"
4       NA
3       NA

But I get

col1    col2
1       NA
4       NA
4       NA
3       NA

What did I do wrong?

CodePudding user response:

The issue is that your case_when has no case for col2 == 4. Therefore NA is returned. According to the docs:

If no cases match, NA is returned.

To fix that add a default value via TRUE ~ col2 to your case_when:

df <- data.frame(
  col1 = c(1, 4, 4, 3),
  col2 = c("a", "c", NA, NA)
)

library(dplyr)

df %>%
  mutate(col2 = case_when(
    col1 != 4 ~ NA_character_, 
    TRUE ~ col2))
#>   col1 col2
#> 1    1 <NA>
#> 2    4    c
#> 3    4 <NA>
#> 4    3 <NA>
  • Related