I have a dataframe with two columns:
df <- data.frame (a = c(NA, 0, NA, NA, NA, NA, 0, 0, NA),
b = c(1, 2, 5, 3, 6, 3, 2, 1, 4))
a b
1 NA 1
2 0 2
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 2
8 0 1
9 NA 4
When the value in column a is 0, I want to replace the value in column b; desired end result is:
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
I tried various combinations of mutate with ifelse and case_when, and all but one replaces all of column b with column a values, 0 as well as NA.
Failed attemps:
df %>%
mutate(b = case_when(a == 0 ~ 0))
df %>%
mutate(b = case_when(a == 0 ~ 0,
TRUE ~ as.numeric(as.character(a))))
df %>%
mutate(b = ifelse(a==0, a, b))
All result in:
a b
1 NA NA
2 0 0
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 0 0
8 0 0
9 NA NA
After much consternation, I finally found a solution that produces the result I'm after:
df <- df %>%
mutate(b = ifelse(is.na(a), b, a))
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
But I'm still perplexed as to why the others did not work as expected. Would love some insight here.
CodePudding user response:
Using %in% instead of == can be useful where there are NA values.
In base R the following will give you what you want.
df$b[df$a %in% 0] <- 0
Using this in dplyr is slightly more complicated than base R, but simpler than the previous solutions:
library(dplyr)
df <- df %>% mutate(b = if_else(a %in% 0, 0, b))
The reason for the problems is that NA == 0 gives NA, not FALSE. NA %in% 0 gives FALSE.
CodePudding user response:
A possible solution:
library(dplyr)
df %>%
mutate(b = if_else(a == 0 & !is.na(a), 0, b))
#> a b
#> 1 NA 1
#> 2 0 0
#> 3 NA 5
#> 4 NA 3
#> 5 NA 6
#> 6 NA 3
#> 7 0 0
#> 8 0 0
#> 9 NA 4
CodePudding user response:
In general any operation on an NA
becomes an NA
, so when comparing vectors that have NA
the results will be NA
where either of the original items was NA
.
If you're willing to eschew dplyr
you can do this in base R:
df$b <- ifelse(
is.na(df$a),
df$b,
ifelse(
df$a == 0,
0,
df$b
)
)