I have a table in R like this:
x Y
1 2 1
2 1 1
3 NA 1
4 2 NA
5 1 2
6 2 2
7 1 1
and what I'm hoping to do is make a new column called xy
which bases on if there is a 1
exist in either x
or y
.
For example, if x
is 1 and y
is 2 then the xy
should be 1
; if x
is NA
and y
is 1
then the xy
should be 1
. If both x
and y
is 2
then xy
should be 2
.
The priority of the categorical variables 1
, 2
and NA
is 1
>2
>NA
.
In short what my desired output looks like this:
x Y XY
1 2 1 1
2 1 1 1
3 NA 1 1
4 2 NA 2
5 NA NA NA
6 2 2 2
7 1 1 1
I'm new to R and trying to trim my data. Thank you for your help! I'm really appreciated:)
CodePudding user response:
Try this
library(dplyr)
df |> rowwise() |>
mutate(z1 = coalesce(c_across(x) , 0) , z2 = coalesce(c_across(Y) , 0)) |>
mutate(XY = case_when(any(c_across(z1:z2) == 1) ~ 1 , any(c_across(z1:z2) == 2) ~ 2)) |>
select(-z1 , -z2) |> ungroup() -> ans
- output
# A tibble: 7 × 3
x Y XY
<int> <int> <dbl>
1 2 1 1
2 1 1 1
3 NA 1 1
4 2 NA 2
5 NA NA NA
6 2 2 2
7 1 1 1
- data
df <- structure(list(x = c(2L, 1L, NA, 2L, NA, 2L, 1L), Y = c(1L, 1L,
1L, NA, NA, 2L, 1L)), row.names = c("1", "2", "3", "4", "5",
"6", "7"), class = "data.frame")
CodePudding user response:
You could do it with a case_when
(remembering that it evaluates from the bottom and up):
library(dplyr)
df <-
df |>
mutate(XY = case_when(x == 1 | Y == 1 ~ 1,
x == 2 | Y == 2 ~ 2,
TRUE ~ NA_real_))
Or apply the same logic using base
functionalities:
df$XY <- NA
df$XY[df$x == 2 | df$Y == 2] <- 2
df$XY[df$x == 1 | df$Y == 1] <- 1
Output:
x Y XY
<dbl> <dbl> <dbl>
1 2 1 1
2 1 1 1
3 NA 1 1
4 2 NA 2
5 NA NA NA
6 2 2 2
7 1 1 1
Data:
library(readr)
df <- read_table("
x Y
2 1
1 1
NA 1
2 NA
NA NA
2 2
1 1")