I am trying to transform some data so that the assumptions of linear models (independence, linearity, homogeneity of variance, normality) are met. I want to do this so that I can perform an ANOVA or similar. Square root transforming the response variable within my linear model has worked, but an error appears when I try to log transform.
I have tried:
logCC_emergent_biomass.lm <- lm(log(Total_CC_noAcari_Biomass)~ Dungfauna*Water*Earthworms, data= biomass)
But this error appears:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y'
Normally log transforming in this way works for me so I am not sure what is wrong here. The data of the response variable is all decimal data (e.g. 0.001480370), potentially this is the cause? If this is the case can anyone point me in the direction of how I can transform this data.
This is these are residuals plots when the data is untransformed:
CodePudding user response:
You probably have zeroes in the variable you want to log transform. Example:
df1[1, 1] <- 0
lm(Y ~ log(X1) X2 X3, df1)
# Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
# NA/NaN/Inf in 'x'
# In addition: Warning message:
# In log(X1) : NaNs produced
You could consider log1p
which calculates log(1 x)
.
lm(Y ~ log1p(X1 1) X2 X3, df1)
# Call:
# lm(formula = Y ~ log1p(X1 1) X2 X3, data = df1)
#
# Coefficients:
# (Intercept) log1p(X1 1) X2 X3
# 2.1257 -1.5689 0.5337 1.0699
However, this changes the interpretation, see related post on Cross Validated. Anyway, you should decide what to do with the zero values.
Data:
df1 <- structure(list(X1 = c(0, -0.564698171396089, 0.363128411337339,
0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
-0.0946590384130976, 2.01842371387704, -0.062714099052421), X2 = c(1.30486965422349,
2.28664539270111, -1.38886070111234, -0.278788766817371, -0.133321336393658,
0.635950398070074, -0.284252921416072, -2.65645542090478, -2.44046692857552,
1.32011334573019), X3 = c(-0.306638594078475, -1.78130843398,
-0.171917355759621, 1.2146746991726, 1.89519346126497, -0.4304691316062,
-0.25726938276893, -1.76316308519478, 0.460097354831271, -0.639994875960119
), Y = c(2.00627879909717, 1.08150911284604, 1.41465103918476,
1.37787039819613, 3.04863502238068, -0.828228728348569, 0.198328716326719,
-2.34295203837687, -1.61863179473641, 1.03962922460575)), row.names = c(NA,
-10L), class = "data.frame")