can someone please help me with creating a new variable in R? I need to tell R something like this:
data <- data %>%
mutate(c = if_else(x == y, x, ifelse(x != y, y, ifelse(is.na(x), y, ifelse(is.na(y), x, NA))))
of course, in this form, it doesn't work. If a value in the first column is equal to a value from the second column - use the value from the first column, if they are not equal - then use the value from the second column, if there is NA in the first column, but there is some value in the second column, then use the value from the second column. if there is NA in the second column, but there is some value in the first column, then use the value from the first column. ( the values in x and y are characters)
this is the outcome in the new variable "c" I wish for:
x | y | c |
---|---|---|
1 | 1 | 1 |
1 | 3 | 3 |
NA | 5 | 5 |
6 | NA | 6 |
NA | NA | NA |
CodePudding user response:
You can actually simplify this to just one if_else()
:
library(dplyr)
data <- data %>%
mutate(c = if_else(is.na(y), x, y))
Result:
x y c
1 1 1 1
2 1 3 3
3 NA 5 5
4 6 NA 6
5 NA NA NA
Or for the same result, use dplyr::coalesce()
:
data <- data >%
mutate(c = coalesce(y, x))
Why this works
For your first condition, if_else(x == y, x, …)
— if x == y
, it doesn’t matter if you take x
or y
because they are by definition the same. So you could instead write this as if_else(x == y, y, …)
and get the same result. For your last condition, ifelse(is.na(y), x, NA)
— that final NA
will only be reached if x == y
and x != y
and is.na(x)
and is.na(y)
are all FALSE
, which is impossible. So you don’t need it. At this point, all of your remaining conditions yield y
except for one — when y
is NA
. So we can write as a single if_else()
to reflect this.
CodePudding user response:
Using case_when
from the dplyr
package:
df <- data.frame(x=c(1,1,NA,6,NA),
y=c(1,3,5,NA,NA))
df <- df %>%
mutate(c=case_when(is.na(x) & !is.na(y) ~ y,
!is.na(x) & is.na(y) ~ x,
x==y ~ x,
x!=y ~ y))
Output:
> df
x y c
1 1 1 1
2 1 3 3
3 NA 5 5
4 6 NA 6
5 NA NA NA
CodePudding user response:
Try this:
We could use twice ifelse
and with coalesce
:
df %>%
mutate(c = ifelse(x==y, x, y),
c = ifelse(is.na(x) | is.na(y), coalesce(x,y), c))
x y c
1 1 1 1
2 1 3 1
3 NA 5 5
4 6 NA 6
5 NA NA NA