Using column specific value from a list of values within an apply function-CodePudding

I am writing my own normalization function for scRNA-seq data as many packages assume you are working with densematrices (which I do not want to nor need to do). If you don't know what that means its not an issue. Essentially I want to be able to have a supply function devide all values in a column by a value specific to that column. SO to that end I supply a list of values of the same length as the columns of the data frame. Then other transformations are applied, if no list is provided a sum of the values in the column apply is working on is used instead. Is there a way to do this without a for loop, to keep it vectorized?

# function to run in apply
pseudocount_log2p1_transform <- function(x, scale_factor = 10000, UMI.provided = NULL){
  if(is.null(UMI.provided)){
    counts <- sum(x)}else{
      counts <- UMI.provided
    }
  x <- (x 1)/counts
  x <- x/scale_factor
  return(log2(x))
}

# function which needs fixing
pavlab.normalize <- function(df, UMI = NULL){
  df.cols <- colnames(df)
  df.rows <- rownames(df)
  if( is.null(UMI)){
    df <- data.frame(apply(df,  MARGIN = 2, pseudocount_log2p1_transform))
  }else{
# this line needs to be modified, so its providing the column specific count value
    df <- data.frame(apply(df,  MARGIN = 2, pseudocount_log2p1_transform(UMI.provided=UMI)))
  }
  colnames(df) <- df.cols
  rownames(df)<- df.rows
  return(df)
}

# reproducible example
df.example <- data.frame( a = c(1,0,1,2),
b = c(5,6,8,5),
c = c(4, 5, 4,4) )

count.list <- c(5, 25, 18)

# how do I fix this....?
pavlab.normalize(df = df.example, UMI = count.list)

CodePudding user response：

In the code, if we want to apply corresponding values of UMI with the corresponding columns, instead of using apply which loop over the columns of 'df' only we can use Map with input arguments as df and the UMI and thus it loops over the columns in the data.frame and the elements in the vector assuming the ncol(df) and length(UMI) are the same

pavlab.normalize <- function(df, UMI = NULL){
  df.cols <- colnames(df)
  df.rows <- rownames(df)
  if( is.null(UMI)){
    df <- data.frame(apply(df,  MARGIN = 2, pseudocount_log2p1_transform))
  }else{
#
   df[] <- Map(pseudocount_log2p1_transform, df, UMI.provided = UMI)
  
  }
  colnames(df) <- df.cols
  rownames(df)<- df.rows
  return(df)
}

-testing

> pavlab.normalize(df = df.example, UMI = count.list)
          a         b         c
1 -14.60964 -15.34661 -15.13571
2 -15.60964 -15.12421 -14.87267
3 -14.60964 -14.76164 -15.13571
4 -14.02468 -15.34661 -15.13571