I am trying to use a loop to create a new variable in an existing data frame that is conditional on the values of the variables included in the loop. The logic makes sense to me but I am getting unexpected results.
Take the following data frame as an example:
> df
var1 var2 var3 var4
1 0 1 0 1
2 1 0 0 1
3 1 1 0 1
4 0 0 1 0
5 0 1 1 0
I want to create a new variable (var5) that is equal to 0 if any of vars1-4 are equal to 1. Otherwise, I want this variable to be coded as a missing value. I wrote the following loop:
for (var in c("var1", "var2", "var3", "var4")) {
df$var5 <- ifelse(
df[, var] == 1, 0, NA
)
}
This logic seems straightforward to me, as is similar to a "foreach" loop in Stata, but my results are unexpected:
> for (var in c("var1", "var2", "var3", "var4")) {
df$var5 <- ifelse(
df[, var] == 1, 0, NA
)
}
> df
var1 var2 var3 var4 var5
1 0 1 0 1 0
2 1 0 0 1 0
3 1 1 0 1 0
4 0 0 1 0 NA
5 0 1 1 0 NA
For some reason, the loop seems to only be applying the conditional statement to the last element of "var". Observations 4 and 5 should be be 0, given that those rows contain a one in the list of vars specified.
I'm sure there is something simple I am missing, but does anyone know how to correct this?
CodePudding user response:
Based on your comment HERE, here is how you could solve your problem:
df$var5 = NA
for(var in c("var1", "var2", "var3", "var4")) {
df[df[[var]]==1, "var5"] = 0
}
var1 var2 var3 var4 var5
1 0 1 0 1 0
2 1 0 0 1 0
3 1 1 0 1 0
4 0 0 1 0 0
5 0 1 1 0 0
CodePudding user response:
With each pass of your loop, you are overwriting the results from the previous pass, so var1
through var3
are wasted.
Based on your logic, I suggest rowSums
and a test. You say that the value of var5
should be a 0
if any of var1:var4
are 1
, else it should be NA
, so
df$var5 <- ifelse(rowSums(df == 1) > 0, 0, NA)
df
# var1 var2 var3 var4 var5
# 1 0 1 0 1 0
# 2 1 0 0 1 0
# 3 1 1 0 1 0
# 4 0 0 1 0 0
# 5 0 1 1 0 0
If there are other columns in df
that you do not want considered in this logic, then we can instead do
df$var5 <- ifelse(rowSums(subset(df, select = var1:var4) == 1) > 0, 0, NA)
If you must use a for
loop (discouraged), then you need to include the previous results in your logic.
df$var5 <- NA
for (V in c("var1", "var2", "var3", "var4")) {
df$var5 <- ifelse(!is.na(df$var5) | df[[V]] == 1, 0, NA)
}
df
# var1 var2 var3 var4 var5
# 1 0 1 0 1 0
# 2 1 0 0 1 0
# 3 1 1 0 1 0
# 4 0 0 1 0 0
# 5 0 1 1 0 0