I am trying to generate a new column (variable) based on the value inside multiple columns. I have over 60 columns in the dataset and I wanted to subset the columns that I want to loop through.
The column variables I am using in my condition at all characters, and when a certain pattern is matched, to return a value of 1 in the new variable.
I am using when because I need to run multiple conditions on each column to return a value.
CODE:
df read.csv("sample.csv")
*#Generate new variable name*
df$new_var <- 0
*#For loop through columns 16 to 45*
for (i in colnames(df[16:45])) {
df <- df %>%
mutate(new_var=
case_when(
grepl("I8501", df[[i]]) ~ 1
))
}
This does not work as when I table the results, I only get 1 value matched.
My other attempt was using:
for (i in colnames(df[16:45])) {
df <- df %>%
mutate(new_var=
case_when(
df[[i]] == "I8501" ~ 1
))
}
Any other possible ways to run through multiple columns with multiple conditions and change the value of the variable accordingly? to be achieved using R ?
CodePudding user response:
If I'm understanding what you want, I think you just need to specify another case in your case_when()
for keeping the existing values when things don't match "I8501"
. This is how I would do that:
df$new_var <- 0
for (index in (16:45)) {
df <- df %>%
mutate(
new_var = case_when(
grepl("I8501", df[[index]]) ~ 1,
TRUE ~ df$new_var
)
)
}
I think a better way to do this though would be to use the ever useful apply()
:
has_match = apply(df[, 16:45], 1, function(x) sum(grepl("I8501", x)) > 0)
df$new_var = ifelse(has_match, 1, 0)