I am new to coding and am doing some gene expression analysis. I have a very naïve question. I have a a few gene expression data frames with gene names as rows and cell names as columnsExample gene exp. data frame. I want to log2 transform the data, but am confused between log and log 1. how do perform log2 1 (log(x 1)) transformation of a dataframe in R? is it same as log2 transformation? Should I do t=log(v 1)
?
Any help will be appreciated.
CodePudding user response:
for example dummy
data
dummy <- data.frame(
x = c(1,2,3,4,5),
y = c(2,3,4,5,6)
)
dummy
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
If you want to just log2
transform data, just use log(., base = 2)
like
log(dummy, base = 2)
x y
1 0.000000 1.000000
2 1.000000 1.584963
3 1.584963 2.000000
4 2.000000 2.321928
5 2.321928 2.584963
If you want log2(x 1)
then log(dummy 1, base = 2)
, or if you want log2(x) 1
just log(dummy, base = 2) 1
CodePudding user response:
Park's answer gives the simplest way to log transform a numeric only data.frame but log(x 1, base = b)
is a different problem.
log(x 1)
But if the transformation is y <- log(x 1)
(could be base 2), then beware of floating-point issues. For very small values of abs(x)
the results of log(x 1, base = b)
are unreliable.
x <- seq(.Machine$double.eps, .Machine$double.eps^0.5, length.out = 10)
eq <- log(x 1) == log1p(x)
eq
#[1] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
which(eq)
#[1] 1 4 7 10
This is why base R has a function log1p
. To compute log(x 1, base = 2)
or, equivalently, log2(x 1)
, use
log2p1 <- function(x) log1p(x)/log(2)
eq2 <- log2(x 1) == log2p1(x)
eq2
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
which(eq2)
#[1] 7 10
In both case the difference between log(x 1)
and the numerically more accurate version is smaller in absolute value than .Machine$double.eps
.
abs(log(x 1) - log1p(x)) < .Machine$double.eps
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
abs(log2(x 1) - log2p1(x)) < .Machine$double.eps
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE