I am writing my own normalization function for scRNA-seq data as many packages assume you are working with densematrices (which I do not want to nor need to do). If you don't know what that means its not an issue. Essentially I want to be able to have a supply function devide all values in a column by a value specific to that column. SO to that end I supply a list of values of the same length as the columns of the data frame. Then other transformations are applied, if no list is provided a sum of the values in the column apply is working on is used instead. Is there a way to do this without a for loop, to keep it vectorized?
# function to run in apply
pseudocount_log2p1_transform <- function(x, scale_factor = 10000, UMI.provided = NULL){
if(is.null(UMI.provided)){
counts <- sum(x)}else{
counts <- UMI.provided
}
x <- (x 1)/counts
x <- x/scale_factor
return(log2(x))
}
# function which needs fixing
pavlab.normalize <- function(df, UMI = NULL){
df.cols <- colnames(df)
df.rows <- rownames(df)
if( is.null(UMI)){
df <- data.frame(apply(df, MARGIN = 2, pseudocount_log2p1_transform))
}else{
# this line needs to be modified, so its providing the column specific count value
df <- data.frame(apply(df, MARGIN = 2, pseudocount_log2p1_transform(UMI.provided=UMI)))
}
colnames(df) <- df.cols
rownames(df)<- df.rows
return(df)
}
# reproducible example
df.example <- data.frame( a = c(1,0,1,2),
b = c(5,6,8,5),
c = c(4, 5, 4,4) )
count.list <- c(5, 25, 18)
# how do I fix this....?
pavlab.normalize(df = df.example, UMI = count.list)
CodePudding user response:
In the code, if we want to apply corresponding values of UMI with the corresponding columns, instead of using apply
which loop over the columns of 'df' only we can use Map
with input arguments as df
and the UMI
and thus it loops over the columns in the data.frame and the elements in the vector assuming the ncol(df)
and length(UMI)
are the same
pavlab.normalize <- function(df, UMI = NULL){
df.cols <- colnames(df)
df.rows <- rownames(df)
if( is.null(UMI)){
df <- data.frame(apply(df, MARGIN = 2, pseudocount_log2p1_transform))
}else{
#
df[] <- Map(pseudocount_log2p1_transform, df, UMI.provided = UMI)
}
colnames(df) <- df.cols
rownames(df)<- df.rows
return(df)
}
-testing
> pavlab.normalize(df = df.example, UMI = count.list)
a b c
1 -14.60964 -15.34661 -15.13571
2 -15.60964 -15.12421 -14.87267
3 -14.60964 -14.76164 -15.13571
4 -14.02468 -15.34661 -15.13571