Home > Blockchain >  Compute minima across list elements with NA
Compute minima across list elements with NA

Time:09-30

I got a klugey solution but feel silly writing so much code for what seems simple. This goes pretty fast with lists of a few dozen MB, so I don't need to improve efficiency. But I'd still like help.

I have a large list (n elements, each one is a vector of length m). I need to get the m minimum values across all n elements (what I mean is obvious in code if this is confusing). There are NAs, in some cases with 0 complete cases and in most cases with >=1 complete case. I wrote some code that works fine but it feels like there should be a much simpler way to get here. Can you streamline this code?

Specifically, is there a way to avoid the conditional for the minimum function, and is there an apply-family function that would let me avoid the first cbind?

# make data
rawval<-replicate(10, sample(c(1:10, NA), size = 10, replace =T)
     , simplify = F)

# this seems clunky, does this function have a name?
mymin<-function(x)ifelse(sum(x, na.rm=T)>0, min(x, na.rm =T), NA)

# I don't see why I should need two apply family functions here
tomin<-sapply(rawval, cbind) %>%  apply(MARGIN = 1, FUN = mymin)

Apologies, I suspect this is a duplicate question :(

CodePudding user response:

You may use do.call and cbind the dataset and apply hablar::min_ function rowwise using apply. hablar::min_ returns NA if all the values are NA.

apply(do.call(cbind, rawval), 1, hablar::min_)

You may also use your own function if you don't want to use hablar::min_.

custom_min <- function(x) if(all(is.na(x))) NA else min(x, na.rm = TRUE)
apply(do.call(cbind, rawval), 1, custom_min)

CodePudding user response:

What you want is mapply. It applies a function to every element of multiple lists. See its help page.

I'll suggest you a function. I'm not really sure about the sum part, but if I got it right, you only want to find the min of the rows which have a positive sum.

I benchmarked my_function against your_function and got the following results:

rawval <- replicate(
    1000,
    sample(c(1:10, NA), size = 1000, replace =T),
    simplify = F
)

my_function <- function(values) {
    sums <- mapply(sum, values, na.rm=TRUE)
    mins <- mapply(min, values, na.rm=TRUE)
    mins[sums <= 0] <- NA
    return(mins)
}

your_function <- function(values) {
    mymin<-function(x)ifelse(sum(x, na.rm=T)>0, min(x, na.rm =T), NA)
    
    # I don't see why I should need two apply family functions here
    tomin<- apply(sapply(values, cbind), MARGIN = 1, FUN = mymin)
    return(tomin)
}

microbenchmark::microbenchmark(
    your_function(rawval),
    my_function(rawval)
)
Unit: milliseconds
                  expr     min        lq      mean    median        uq     max neval
 your_function(rawval) 27.0570 27.500851 28.367863 27.939501 28.464301 36.4104   100
   my_function(rawval)  5.0708  5.260801  5.519062  5.347952  5.406151 12.6877   100
  • Related