Home > OS >  Looping over multiple columns to generate a new variable based on a condition
Looping over multiple columns to generate a new variable based on a condition

Time:02-23

I am trying to generate a new column (variable) based on the value inside multiple columns. I have over 60 columns in the dataset and I wanted to subset the columns that I want to loop through.

The column variables I am using in my condition at all characters, and when a certain pattern is matched, to return a value of 1 in the new variable.

I am using when because I need to run multiple conditions on each column to return a value.

CODE:

df read.csv("sample.csv")

*#Generate new variable name*
df$new_var <- 0

*#For loop through columns 16 to 45* 
for (i in colnames(df[16:45])) {
   df <- df %>%
     mutate(new_var= 
         case_when(
           grepl("I8501", df[[i]]) ~ 1
           ))
}

This does not work as when I table the results, I only get 1 value matched.

My other attempt was using:

for (i in colnames(df[16:45])) {
   df <- df %>%
    mutate(new_var= 
            case_when(
               df[[i]] == "I8501" ~ 1
            ))
} 

Any other possible ways to run through multiple columns with multiple conditions and change the value of the variable accordingly? to be achieved using R ?

CodePudding user response:

If I'm understanding what you want, I think you just need to specify another case in your case_when() for keeping the existing values when things don't match "I8501". This is how I would do that:

df$new_var <- 0
for (index in (16:45)) {
  df <- df %>%
    mutate(
      new_var = case_when(
        grepl("I8501", df[[index]]) ~ 1,
        TRUE ~ df$new_var
      )
    )
}

I think a better way to do this though would be to use the ever useful apply():

has_match = apply(df[, 16:45], 1, function(x) sum(grepl("I8501", x)) > 0)
df$new_var = ifelse(has_match, 1, 0)
  • Related