Home > Net >  Loops with ifelse() statement to create conditional variable in an existing data frame
Loops with ifelse() statement to create conditional variable in an existing data frame

Time:11-05

I am trying to use a loop to create a new variable in an existing data frame that is conditional on the values of the variables included in the loop. The logic makes sense to me but I am getting unexpected results.

Take the following data frame as an example:

> df
  var1 var2 var3 var4
1    0    1    0    1
2    1    0    0    1
3    1    1    0    1
4    0    0    1    0
5    0    1    1    0

I want to create a new variable (var5) that is equal to 0 if any of vars1-4 are equal to 1. Otherwise, I want this variable to be coded as a missing value. I wrote the following loop:

for (var in c("var1", "var2", "var3", "var4")) {
  df$var5 <- ifelse(
    df[, var] == 1, 0, NA
  )
}

This logic seems straightforward to me, as is similar to a "foreach" loop in Stata, but my results are unexpected:

> for (var in c("var1", "var2", "var3", "var4")) {
    df$var5 <- ifelse(
      df[, var] == 1, 0, NA
    )
  }
> df
  var1 var2 var3 var4 var5
1    0    1    0    1    0
2    1    0    0    1    0
3    1    1    0    1    0
4    0    0    1    0   NA
5    0    1    1    0   NA

For some reason, the loop seems to only be applying the conditional statement to the last element of "var". Observations 4 and 5 should be be 0, given that those rows contain a one in the list of vars specified.

I'm sure there is something simple I am missing, but does anyone know how to correct this?

CodePudding user response:

Based on your comment HERE, here is how you could solve your problem:

df$var5 = NA
for(var in c("var1", "var2", "var3", "var4")) {
  df[df[[var]]==1, "var5"] = 0
}

  var1 var2 var3 var4 var5
1    0    1    0    1    0
2    1    0    0    1    0
3    1    1    0    1    0
4    0    0    1    0    0
5    0    1    1    0    0

CodePudding user response:

With each pass of your loop, you are overwriting the results from the previous pass, so var1 through var3 are wasted.

Based on your logic, I suggest rowSums and a test. You say that the value of var5 should be a 0 if any of var1:var4 are 1, else it should be NA, so

df$var5 <- ifelse(rowSums(df == 1) > 0, 0, NA)
df
#   var1 var2 var3 var4 var5
# 1    0    1    0    1    0
# 2    1    0    0    1    0
# 3    1    1    0    1    0
# 4    0    0    1    0    0
# 5    0    1    1    0    0

If there are other columns in df that you do not want considered in this logic, then we can instead do

df$var5 <- ifelse(rowSums(subset(df, select = var1:var4) == 1) > 0, 0, NA)

If you must use a for loop (discouraged), then you need to include the previous results in your logic.

df$var5 <- NA
for (V in c("var1", "var2", "var3", "var4")) {
  df$var5 <- ifelse(!is.na(df$var5) | df[[V]] == 1, 0, NA)
}
df
#   var1 var2 var3 var4 var5
# 1    0    1    0    1    0
# 2    1    0    0    1    0
# 3    1    1    0    1    0
# 4    0    0    1    0    0
# 5    0    1    1    0    0
  • Related