Home > Enterprise >  How to fill column based on condition taking other columns into account?
How to fill column based on condition taking other columns into account?

Time:11-30

to fill an empty column of a dataframe based on a condition taking another column into account, i have found following solution, which works fine, but is somehow a little bit ugly. does anybody know a more elegant way to solve this?

base::set.seed(123)
test_df <- base::data.frame(vec1 = base::sample(base::seq(1, 100, 1), 50), vec2 = base::seq(1, 50, 1), vec3 = NA)

for (a in 1:base::nrow(test_df)){
  spc_test_df <- test_df[a, ]
  # select the specific row of the dataframe
  if(spc_test_df$vec1 <= 25 | spc_test_df$vec1 >= 75){
    # evaluate whether the deviation is below/above the threshold
    spc_test_df$vec3 <- 1
    # if so, write TRUE
  } else {
    spc_test_df$vec3 <- 0
    # if not so, write FALSE
  }
  test_df[a, ] <- spc_test_df
  # write the specific row back to the dataframe
}

CodePudding user response:

There is no need for a for-loop as you can use vectorized solutions in this case. Three options on how to solve this problem:

# option 1
test_df$vec3 <-  (test_df$vec1 <= 25 | test_df$vec1 >= 75)

# option 2
test_df$vec3 <- as.integer(test_df$vec1 <= 25 | test_df$vec1 >= 75)

# option 3
test_df$vec3 <- ifelse(test_df$vec1 <= 25 | test_df$vec1 >= 75, 1, 0)

which in all cases gives:

   vec1 vec2 vec3
1     5    1    1
2     6    2    1
3    61    3    0
4    20    4    1

....

47    3   47    1
48   55   48    0
49   44   49    0
50   97   50    1

(only first and last four rows presentend)

  • Related