Home > Enterprise >  How can lapply work with addressing columns as unknown variables?
How can lapply work with addressing columns as unknown variables?

Time:11-07

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:

group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group

And I tried something like this:

lapply(control_for, function(x) {
  x <- as.factor(sampleTable[, x])
  dge_obj$samples$x <- x
}

Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?

CodePudding user response:

Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.

Define a vector common_vars of the table's names in control_for. Then create the new columns.

library(edgeR)

sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")

common_vars <- intersect(control_for, names(sampleTable))

1. for loop

for(x in common_vars){
  y <- sampleTable[[x]]
  dge_obj$samples[[x]] <- factor(y)  
}

2. *apply loop.

tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)

This code can be rewritten as a one-liner.


Data

set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))
  • Related