Home > front end >  Replace logical values conditionally in R
Replace logical values conditionally in R

Time:02-08

I am sure this question has been asked before and has an easy solution, but I can't seem to find it.

I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.

I have created my eligibility variable in dataframe screen:

screen$eligible <- ifelse (
                            (screen$age > 17 & screen$age < 23) 
                           & (screen$alcohol > 3 | screen$marijuana > 3)
                           & (screen$country == 0 | screen$ageus < 12) 
                           & (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
                           & (screen$residence_1 == 47),
                           TRUE,
                             FALSE)

And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.

screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse( 
  (screen$gender_1 == 1 & screen$age > 18) 
  |(screen$gender_8 == 1 & screen$age > 20),
  FALSE, TRUE), FALSE)

I ultimately want TRUE or FALSE values.

Two questions

  1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
  2. Any ideas as to why I might be introducing NAs?

CodePudding user response:

1. Is there a clearer or more concise way to update the code to update my eligibility requirements?

If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,

screen$eligible <- (screen$age > 17 & screen$age < 23) 
                   & (screen$alcohol > 3 | screen$marijuana > 3)
                   & (screen$country == 0 | screen$ageus < 12) 
                   & screen$county_1 %in% c(17, 27, 31)
                   & (screen$residence_1 == 47)

Likewise, your second codeblock could be simplified as:

screen$eligible <- screen$eligible 
                   & ((screen$gender_1 == 1 & screen$age > 18) 
                     | (screen$gender_8 == 1 & screen$age > 20))

2. Any ideas as to why I might be introducing NAs?

It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.

CodePudding user response:

continuing from what @zephryl wrote, an even more readable code is:

screen$eligible <- with(screen, 
   (age > 17 & age < 23) 
   & (alcohol > 3 | marijuana > 3)
   & (country == 0 | ageus < 12) 
   & county_1 %in% c(17, 27, 31)
   & (residence_1 == 47))
  1. to detect where are the NAs:
sapply(screen, anyNA)
  •  Tags:  
  • Related