I have a fairly complex function that modifies some character variables. While coding the function, I bumped into a curious problem with handling NA values. I will spare you the complex function and instead present the problem in the MWE below:
# Create an example data frame
df <- data.frame(noun = c("apple", NA, "banana"))
# Display the example data frame
df
#> noun
#> 1 apple
#> 2 <NA>
#> 3 banana
# Introduce the function
process_my_df <- function(input_data, my_var) {
# Create a new variable based on an existing variable
for (i in 1:nrow(input_data)) {
if (!is.na(input_data[[my_var]][i])) {
input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
}
}
return(input_data)
}
# Call the function to process the data frame
processed_df <- process_my_df(df, "noun")
# Display the resulting df
processed_df
#> noun noun_result
#> 1 apple is a fruit
#> 2 <NA> is a fruit
#> 3 banana is a fruit
Created on 2023-11-03 with reprex v2.0.2
My question: based on the condition if (!is.na(input_data[[my_var]][i])) {}
I would expect the following result:
#> noun noun_result
#> 1 apple is a fruit
#> 2 <NA> <NA>
#> 3 banana is a fruit
What's going on?
EDIT:
As a result of the accepted answer below, I added one simple line inside the function and now everything works fine:
# Introduce the function
process_my_df <- function(input_data, my_var) {
# Create a new variable based on an existing variable
# But first, "prime" it with NA_character_
input_data[[paste0(my_var, "_result")]] = NA_character_
for (i in 1:nrow(input_data)) {
if (!is.na(input_data[[my_var]][i])) {
input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
}
}
return(input_data)
}
Created on 2023-11-03 with reprex v2.0.2
CodePudding user response:
The issue happens when you implicitly create the new column. If you do it explicitly, it works correctly:
# Call the function to process the data frame
df$noun_result = ""
processed_df <- process_my_df(df, "noun")
# Display the resulting df
processed_df
# noun noun_result
# 1 apple is a fruit
# 2 <NA>
# 3 banana is a fruit
CodePudding user response:
Given the explanation provided by @Andrey Shabalin, you need an else
condition
process_my_df <- function(input_data, my_var) {
# Create a new variable based on an existing variable
for (i in 1:nrow(input_data)) {
if (!is.na(input_data[[my_var]][i])) {
input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
} else {
input_data[[paste0(my_var, "_result")]][i] <- NA
}
}
return(input_data)
}