to fill an empty column of a dataframe based on a condition taking another column into account, i have found following solution, which works fine, but is somehow a little bit ugly. does anybody know a more elegant way to solve this?
base::set.seed(123)
test_df <- base::data.frame(vec1 = base::sample(base::seq(1, 100, 1), 50), vec2 = base::seq(1, 50, 1), vec3 = NA)
for (a in 1:base::nrow(test_df)){
spc_test_df <- test_df[a, ]
# select the specific row of the dataframe
if(spc_test_df$vec1 <= 25 | spc_test_df$vec1 >= 75){
# evaluate whether the deviation is below/above the threshold
spc_test_df$vec3 <- 1
# if so, write TRUE
} else {
spc_test_df$vec3 <- 0
# if not so, write FALSE
}
test_df[a, ] <- spc_test_df
# write the specific row back to the dataframe
}
CodePudding user response:
There is no need for a for-loop as you can use vectorized solutions in this case. Three options on how to solve this problem:
# option 1
test_df$vec3 <- (test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 2
test_df$vec3 <- as.integer(test_df$vec1 <= 25 | test_df$vec1 >= 75)
# option 3
test_df$vec3 <- ifelse(test_df$vec1 <= 25 | test_df$vec1 >= 75, 1, 0)
which in all cases gives:
vec1 vec2 vec3
1 5 1 1
2 6 2 1
3 61 3 0
4 20 4 1
....
47 3 47 1
48 55 48 0
49 44 49 0
50 97 50 1
(only first and last four rows presentend)