Home > Software engineering >  problem with missing values for calculating the median
problem with missing values for calculating the median

Time:10-16

I'm having problem with managing with NAs to calculate median by using multiple matrix.

This is an example of the code and data I'm working on:

#Data example

m1 = matrix(c(2, 4, 3, 1),nrow=2, ncol=2, byrow = TRUE)  
m2 = matrix(c(NA, 5, 7, 9),nrow=2, ncol=2, byrow = TRUE)
m3 = matrix(c(NA, 8, 10, 14),nrow=2, ncol=2, byrow = TRUE)
 
Median calculation

apply(abind::abind(m1, m2, m3, along = 3), 1:2, median)
         [,1] [,2]
     [1,] NA    5
     [2,]  7    9

As expected the the function doesn't return a value for cells which contains NAs.

The problem is that if I replace NAs with 0 I'll get an output like this:

#Data example

m1 = matrix(c(2, 4, 3, 1),nrow=2, ncol=2, byrow = TRUE)  
m2 = matrix(c(0, 5, 7, 9),nrow=2, ncol=2, byrow = TRUE)
m3 = matrix(c(0, 8, 10, 14),nrow=2, ncol=2, byrow = TRUE)
 
Median calculation

apply(abind::abind(m1, m2, m3, along = 3), 1:2, median)
         [,1] [,2]
     [1,]  0    5
     [2,]  7    9  

I'm trying instead to get an output where cells which reports NAs are just skipped so that only values are take into consideration. As in the example, if I have cells with NA, NA, 2 I would expect to get 2 as result while (out of the example) for cells with NA,2,5 I would expect 3.5 as result.

         [,1] [,2]
     [1,]  2    5
     [2,]  7    9 

Do you have an idea of how I could get this results? Any suggestion would be appreciated, thanks.

CodePudding user response:

Perhaps you should drop de NA's first? Try adding na.rm = TRUE

CodePudding user response:

Just pass the argument na.rm=TRUE inside apply

apply(abind::abind(m1, m2, m3, along = 3), 1:2, median, na.rm = TRUE)

Output:

     [,1] [,2]
[1,]    2    5
[2,]    7    9
  • Related