Generate random binary variable conditionally in R-CodePudding

I would like to add an extra column, z based on the following conditions:

if x == "A", generate a binary variable assuming the prob of success (=1) is 0.5
if x == "C" & y == "N", generate a binary variable assuming the prob of success is 0.25.

# Sample data
df <- tibble(
x = ("A", "C", "C", "B", "C", "A", "A"),
y = ("Y", "N", "Y", "N", "N", "N", "Y"))

Currently, my approach uses filter, then set.seed and rbinom, and finally rbind. But I am looking for a more elegant solution that doesn't involve subseting and re-joining the data.

CodePudding user response：

You may put your logic into a simple if / else structure and wrap it in a function g().

g <- \(z) {
  if (z['x'] == 'A') {
    rbinom(1, 1, .5)
  } 
  else if (z['x'] == 'C' & z['y'] == 'N') {
    rbinom(1, 1, .25)
  } else {
    NA
  }
}

set.seed(42)
transform(df, z=apply(df, 1, g))
#   x y  z
# 1 A Y  1
# 2 C N  1
# 3 C Y NA
# 4 B N NA
# 5 C N  0
# 6 A N  1
# 7 A Y  1

CodePudding user response：

This is a good case for dplyr::case_when since you are using tidyverse functions.

library(dplyr)
set.seed(1)
df %>% 
  mutate(z = case_when(x == "A" ~ rbinom(n(), 1, 0.5),
                       x == "C" & y == "N" ~ rbinom(n(), 1, 0.25)))

# A tibble: 7 x 3
# Rowwise: 
  x     y         z
  <chr> <chr> <int>
1 A     Y         0
2 C     N         1
3 C     Y        NA
4 B     N        NA
5 C     N         0
6 A     N         0
7 A     Y         1

CodePudding user response：

You can try nested ifelse like below

transform(
    df,
    z = suppressWarnings(
        rbinom(
            nrow(df), 1,
            ifelse(x == "A", 0.5,
                ifelse(x == "C" & y == "N", 0.25, NA)
            )
        )
    )
)

which gives

  x y  z
1 A Y  1
2 C N  0
3 C Y NA
4 B N NA
5 C N  1
6 A N  1
7 A Y  1