Home > other >  Why is the index in my R code numeric and not na
Why is the index in my R code numeric and not na

Time:10-21

The question for my R exam is to write a function that takes all the NA's in a vector, and replaces them with the average of all the numeric indices. This is what I wrote:

    na_replace <- function (x)
       {for (i in 1:length(x)) 
          if (is.numeric(x[i]))
            {average<- c(is.numeric(x[i]))}
            if (is.na(x[i])) 
          {x[i] = mean(average)}
           return(x)}

What I get when I input the vector c(1,NA,3,NA) is 1,NA,3,1. When I checked the vector, it says that the first NA is numeric and the second one is NA. Why is that so?

CodePudding user response:

NA is not a type. There are various types of NA (one for each atomic type) and your was of type "_NA_numeric". Read the help page at ?NA. There's a function named is.na that would have been of use because it returns a logical vector suitable for indexing. The is.na function works with all types of NA.

my.bad.imputation.fun <- function(x){ x[is.na(x)] <- mean(x, na.rm=TRUE); x }
 my.x <- c(1,NA,3,NA)
 my.bad.imputation.fun(my.x)
#[1] 1 2 3 2

Note the lack of loops. I hope using for-loops was a habit you picked up from another language and not a strategy you picked up in your class.. R does not use as many for loops as say BASIC or C. It has many vectorized functions that replace for-loops for iterative operations.

CodePudding user response:

There's a few things going wrong here. As IRTFM mentioned NA is not a class, but I want to dig a bit more into the code itself as well:

I imagine that you want your average to be 2 here, no? In that case taking if (is.numeric(x[i])) {average<- c(is.numeric(x[i]))} sounds counter-intuitive. You want to have 1 single average for the entire vector so let's just change that to the following:

average <- mean(x, na.rm = T)

the na.rm part of this function handily ignores the NA values and takes the average of 1 and 3, in your example.

Next you want to make sure you put {} around all the code that is run within your for-loop, just like you are already doing with you if-statement. It isn't technically needed when you just run 1 line but it is a good practice nontheless. This would look the following way:

for (i in 1:length(x)) {
    if (is.na(x[i])) {
      x[i] = average
    }
  }

However, doing all this work with an explicit for-loop and if-statement if really unnecessary. You can write the entire loop posted above by simply writing it like so:

x[is.na(x)] <- average

If we then put everything together your function can be as small as this:

na_replace <- function (x) {
  x[is.na(x)] <- mean(x, na.rm = T)
  return(x)
}
  • Related