How to merge two column in a table based on certain condition in R-CodePudding

I have a table in R like this:

      x     Y
1     2     1
2     1     1
3    NA     1
4     2    NA
5     1     2
6     2     2
7     1     1

and what I'm hoping to do is make a new column called xy which bases on if there is a 1 exist in either x or y.

For example, if x is 1 and y is 2 then the xy should be 1 ; if x is NAand y is 1 then the xyshould be 1. If both x and y is 2 then xyshould be 2.

The priority of the categorical variables 1, 2 and NA is 1>2>NA. In short what my desired output looks like this:

      x     Y     XY
1     2     1     1
2     1     1     1
3    NA     1     1
4     2    NA     2
5     NA   NA     NA
6     2     2     2
7     1     1     1

I'm new to R and trying to trim my data. Thank you for your help! I'm really appreciated:)

CodePudding user response：

Try this

library(dplyr)

df |> rowwise() |>
   mutate(z1 = coalesce(c_across(x) , 0) , z2 = coalesce(c_across(Y) , 0)) |>
   mutate(XY = case_when(any(c_across(z1:z2) == 1) ~ 1 , any(c_across(z1:z2) == 2) ~ 2)) |>
   select(-z1 , -z2) |> ungroup() -> ans

output

# A tibble: 7 × 3
      x     Y    XY
  <int> <int> <dbl>
1     2     1     1
2     1     1     1
3    NA     1     1
4     2    NA     2
5    NA    NA    NA
6     2     2     2
7     1     1     1

data

df <- structure(list(x = c(2L, 1L, NA, 2L, NA, 2L, 1L), Y = c(1L, 1L, 
1L, NA, NA, 2L, 1L)), row.names = c("1", "2", "3", "4", "5", 
"6", "7"), class = "data.frame")

CodePudding user response：

You could do it with a case_when (remembering that it evaluates from the bottom and up):

library(dplyr)

df <-
  df |> 
  mutate(XY = case_when(x == 1 | Y == 1 ~ 1,
                        x == 2 | Y == 2 ~ 2, 
                        TRUE ~ NA_real_))

Or apply the same logic using base functionalities:

df$XY <- NA
df$XY[df$x == 2 | df$Y == 2] <- 2
df$XY[df$x == 1 | df$Y == 1] <- 1

Output:

      x     Y    XY
  <dbl> <dbl> <dbl>
1     2     1     1
2     1     1     1
3    NA     1     1
4     2    NA     2
5    NA    NA    NA
6     2     2     2
7     1     1     1

Data:

library(readr)

df <- read_table("
                  x     Y
                  2     1
                  1     1
                  NA    1
                  2     NA
                  NA    NA
                  2     2
                  1     1")