Home > other >  Combine two columns under some conditions
Combine two columns under some conditions

Time:10-09

I would like to know how to combine two columns, x and y, so that I can make a new column, "A", under some conditions.

Column A looks as follows.

 ---- ---- ---- ---- 
| ID | x  | y  | A  |
 ---- ---- ---- ---- 
|  1 | NA | 10 | 10 |
|  2 | 2  | NA | 2  |
|  3 | NA | NA | NA |
|  4 | 8  | 5  | 8  |
|  5 | 4  | 5  | 5  |
|  6 | 3  | 3  | 3  |
|  7 | NA | 4  | 4  |
|  8 | 7  | NA | 7  |
 ---- ---- ---- ---- 

df<-structure(list(ID = 1:8, x = c(NA, 2L, NA, 8L, 4L, 3L, NA, 7L
), y = c(10L, NA, NA, 5L, 5L, 3L, 4L, NA), A = c(10L, 2L, NA, 
                                                 8L, 5L, 3L, 4L, 7L)), row.names = c(NA, -8L), class = c("data.table", 
                                                                                                         "data.frame"))

The rules for column A are as follows.

  1. When x is NA and y contains some value, A contains the value in y.
  2. Similarly, when y is NA and x contains some value, A contains the value in x.
  3. When x value is greater than y value, then A contains the greater value between the two.
  4. Similarly, when y value is greater than x value, then A contains the greater value between the two.
  5. When x and y are both NA, then A is NA.
  6. When x and y contain the same value, then A value contains the value.

I tried with following code

df<-df%>%
  mutate(A= case_when(if_any(c(x,y), ~ .x == NA) ~ "NA", TRUE ~ A))

but then the code didn't work, and I failed to find out a way to add a condition 3 and 4 above.

Thank you very much in advance.

CodePudding user response:

We could use pmax

library(dplyr)
df %>%
    mutate(A = pmax(x, y, na.rm = TRUE))

-output

    ID     x     y     A
   <int> <int> <int> <int>
1:     1    NA    10    10
2:     2     2    NA     2
3:     3    NA    NA    NA
4:     4     8     5     8
5:     5     4     5     5
6:     6     3     3     3
7:     7    NA     4     4
8:     8     7    NA     7

Or in data.table

library(data.table)
df[, A := pmax(x, y,  na.rm = TRUE)]

Or in base R

df$A <- with(df, pmax(x, y, na.rm = TRUE))
  • Related