Home > Enterprise >  Why does an if loop not catch an NA value correctly inside a function in R?
Why does an if loop not catch an NA value correctly inside a function in R?

Time:11-05

I have a fairly complex function that modifies some character variables. While coding the function, I bumped into a curious problem with handling NA values. I will spare you the complex function and instead present the problem in the MWE below:

# Create an example data frame
df <- data.frame(noun = c("apple", NA, "banana"))

# Display the example data frame
df
#>     noun
#> 1  apple
#> 2   <NA>
#> 3 banana

# Introduce the function 
process_my_df <- function(input_data, my_var) {
  # Create a new variable based on an existing variable
  for (i in 1:nrow(input_data)) {
    if (!is.na(input_data[[my_var]][i])) {
      input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
    }
  }
  return(input_data)
}

# Call the function to process the data frame
processed_df <- process_my_df(df, "noun")

# Display the resulting df
processed_df
#>     noun noun_result
#> 1  apple  is a fruit
#> 2   <NA>  is a fruit
#> 3 banana  is a fruit

Created on 2023-11-03 with reprex v2.0.2

My question: based on the condition if (!is.na(input_data[[my_var]][i])) {} I would expect the following result:

#>     noun noun_result
#> 1  apple  is a fruit
#> 2   <NA>        <NA>
#> 3 banana  is a fruit

What's going on?

EDIT:

As a result of the accepted answer below, I added one simple line inside the function and now everything works fine:

# Introduce the function 
process_my_df <- function(input_data, my_var) {
  # Create a new variable based on an existing variable
  
  # But first, "prime" it with NA_character_
  input_data[[paste0(my_var, "_result")]] = NA_character_
  
  for (i in 1:nrow(input_data)) {
    if (!is.na(input_data[[my_var]][i])) {
      input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
    }
  }
  return(input_data)
}

Created on 2023-11-03 with reprex v2.0.2

CodePudding user response:

The issue happens when you implicitly create the new column. If you do it explicitly, it works correctly:

# Call the function to process the data frame
df$noun_result = ""
processed_df <- process_my_df(df, "noun")

# Display the resulting df
processed_df
# noun noun_result
# 1  apple  is a fruit
# 2   <NA>            
# 3 banana  is a fruit

CodePudding user response:

Given the explanation provided by @Andrey Shabalin, you need an else condition

process_my_df <- function(input_data, my_var) {
  # Create a new variable based on an existing variable
  for (i in 1:nrow(input_data)) {
    if (!is.na(input_data[[my_var]][i])) {
      input_data[[paste0(my_var, "_result")]][i] <- "is a fruit"
    } else {
      input_data[[paste0(my_var, "_result")]][i] <- NA
    }
  }
  return(input_data)
}
  • Related