dplyr mutate with ifelse or case_when not working as expected?-CodePudding

I have a dataframe with two columns:

df <- data.frame (a  = c(NA, 0, NA, NA, NA, NA, 0, 0, NA),
                  b = c(1, 2, 5, 3, 6, 3, 2, 1, 4))

   a b
1 NA 1
2  0 2
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7  0 2
8  0 1
9 NA 4

When the value in column a is 0, I want to replace the value in column b; desired end result is:

   a b
1 NA 1
2  0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7  0 0
8  0 0
9 NA 4

I tried various combinations of mutate with ifelse and case_when, and all but one replaces all of column b with column a values, 0 as well as NA.

Failed attemps:

df %>%
  mutate(b = case_when(a == 0 ~ 0))

df %>%
  mutate(b = case_when(a == 0 ~ 0, 
                       TRUE ~ as.numeric(as.character(a))))

df %>%
  mutate(b = ifelse(a==0, a, b))

All result in:

   a  b
1 NA NA
2  0  0
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7  0  0
8  0  0
9 NA NA

After much consternation, I finally found a solution that produces the result I'm after:

df <- df %>%
  mutate(b = ifelse(is.na(a), b, a))

   a b
1 NA 1
2  0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7  0 0
8  0 0
9 NA 4

But I'm still perplexed as to why the others did not work as expected. Would love some insight here.

CodePudding user response：

Using %in% instead of == can be useful where there are NA values.

In base R the following will give you what you want.

df$b[df$a %in% 0] <- 0

Using this in dplyr is slightly more complicated than base R, but simpler than the previous solutions:

library(dplyr)
df <- df %>% mutate(b = if_else(a %in% 0, 0, b))

The reason for the problems is that NA == 0 gives NA, not FALSE. NA %in% 0 gives FALSE.

CodePudding user response：

A possible solution:

library(dplyr)

df %>% 
  mutate(b = if_else(a == 0 & !is.na(a), 0, b)) 

#>    a b
#> 1 NA 1
#> 2  0 0
#> 3 NA 5
#> 4 NA 3
#> 5 NA 6
#> 6 NA 3
#> 7  0 0
#> 8  0 0
#> 9 NA 4

CodePudding user response：

In general any operation on an NA becomes an NA, so when comparing vectors that have NA the results will be NA where either of the original items was NA.

If you're willing to eschew dplyr you can do this in base R:

df$b <- ifelse(
          is.na(df$a), 
          df$b, 
          ifelse(
            df$a == 0, 
            0,  
            df$b
          )
        )