The question for my R exam is to write a function that takes all the NA's in a vector, and replaces them with the average of all the numeric indices. This is what I wrote:
na_replace <- function (x)
{for (i in 1:length(x))
if (is.numeric(x[i]))
{average<- c(is.numeric(x[i]))}
if (is.na(x[i]))
{x[i] = mean(average)}
return(x)}
What I get when I input the vector c(1,NA,3,NA) is 1,NA,3,1. When I checked the vector, it says that the first NA is numeric and the second one is NA. Why is that so?
CodePudding user response:
NA is not a type. There are various types of NA (one for each atomic type) and your was of type "_NA_numeric". Read the help page at ?NA
. There's a function named is.na
that would have been of use because it returns a logical vector suitable for indexing. The is.na
function works with all types of NA.
my.bad.imputation.fun <- function(x){ x[is.na(x)] <- mean(x, na.rm=TRUE); x }
my.x <- c(1,NA,3,NA)
my.bad.imputation.fun(my.x)
#[1] 1 2 3 2
Note the lack of loops. I hope using for
-loops was a habit you picked up from another language and not a strategy you picked up in your class.. R does not use as many for
loops as say BASIC or C. It has many vectorized functions that replace for-loops for iterative operations.
CodePudding user response:
There's a few things going wrong here. As IRTFM mentioned NA is not a class, but I want to dig a bit more into the code itself as well:
I imagine that you want your average to be 2 here, no? In that case taking if (is.numeric(x[i])) {average<- c(is.numeric(x[i]))}
sounds counter-intuitive. You want to have 1 single average for the entire vector so let's just change that to the following:
average <- mean(x, na.rm = T)
the na.rm
part of this function handily ignores the NA
values and takes the average of 1 and 3, in your example.
Next you want to make sure you put {}
around all the code that is run within your for-loop, just like you are already doing with you if-statement. It isn't technically needed when you just run 1 line but it is a good practice nontheless. This would look the following way:
for (i in 1:length(x)) {
if (is.na(x[i])) {
x[i] = average
}
}
However, doing all this work with an explicit for-loop and if-statement if really unnecessary. You can write the entire loop posted above by simply writing it like so:
x[is.na(x)] <- average
If we then put everything together your function can be as small as this:
na_replace <- function (x) {
x[is.na(x)] <- mean(x, na.rm = T)
return(x)
}