I would like to know how to combine two columns, x and y, so that I can make a new column, "A", under some conditions.
Column A looks as follows.
---- ---- ---- ----
| ID | x | y | A |
---- ---- ---- ----
| 1 | NA | 10 | 10 |
| 2 | 2 | NA | 2 |
| 3 | NA | NA | NA |
| 4 | 8 | 5 | 8 |
| 5 | 4 | 5 | 5 |
| 6 | 3 | 3 | 3 |
| 7 | NA | 4 | 4 |
| 8 | 7 | NA | 7 |
---- ---- ---- ----
df<-structure(list(ID = 1:8, x = c(NA, 2L, NA, 8L, 4L, 3L, NA, 7L
), y = c(10L, NA, NA, 5L, 5L, 3L, 4L, NA), A = c(10L, 2L, NA,
8L, 5L, 3L, 4L, 7L)), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
The rules for column A are as follows.
- When x is NA and y contains some value, A contains the value in y.
- Similarly, when y is NA and x contains some value, A contains the value in x.
- When x value is greater than y value, then A contains the greater value between the two.
- Similarly, when y value is greater than x value, then A contains the greater value between the two.
- When x and y are both NA, then A is NA.
- When x and y contain the same value, then A value contains the value.
I tried with following code
df<-df%>%
mutate(A= case_when(if_any(c(x,y), ~ .x == NA) ~ "NA", TRUE ~ A))
but then the code didn't work, and I failed to find out a way to add a condition 3 and 4 above.
Thank you very much in advance.
CodePudding user response:
We could use pmax
library(dplyr)
df %>%
mutate(A = pmax(x, y, na.rm = TRUE))
-output
ID x y A
<int> <int> <int> <int>
1: 1 NA 10 10
2: 2 2 NA 2
3: 3 NA NA NA
4: 4 8 5 8
5: 5 4 5 5
6: 6 3 3 3
7: 7 NA 4 4
8: 8 7 NA 7
Or in data.table
library(data.table)
df[, A := pmax(x, y, na.rm = TRUE)]
Or in base R
df$A <- with(df, pmax(x, y, na.rm = TRUE))