I'm trying to apply a custom function row-by-row throughout a data frame. The function is a t-test using summary statistics reported in the data frame, and I want to append the existing data frame with the four output columns of the function (Difference of means, Std Error, t, p-value). The function is working fine, but I am not sure where/how to call the variables of the function to the columns in the data frame.
Here is a sample of the data frame:
MPAmeans <- data.frame(
group = c("ccfrp","kelp","beach","intertidal"),
MPA_mean = c(935,974,50.8,935),
reference_mean = c(388,388,37.6,266),
sd_MPA = c(208, 488, 85.9, 60),
sd_reference = c(170, 170, 62, 151),
n_MPA = (3,3,3,3),
n_reference = c(3,3,3,3)
And the function I want to apply to every row...
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if( equal.variance==FALSE )
{
se <- sqrt( (s1^2/n1) (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 1/n2) * ((n1-1)*s1^2 (n2-1)*s2^2)/(n1 n2-2) )
df <- n1 n2-2
}
t <- (m1-m2-m0)/se
dat <- c(m1-m2, se, t, 2*pt(-abs(t),df)) #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed.
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
I'm using apply(), and have listed the function reference to the data frame below...
apply(MPAmeans,
1,
t.test2)
m1=reference_mean,
m2=MPA_mean,
s1=sd_reference,
s2=sd_MPA,
n1=n_reference,
n2=n_MPA
How do I reference/call the function variables in apply() and then append the four new columns to the data frame?
CodePudding user response:
We can either specify the arguments one by one with a lambda function (function(x)
) after looping or create a named list
and use do.call
apply(MPAmeans[c(3, 2, 5, 4, 7, 6)], 1, function(x)
do.call(t.test2, setNames(as.list(x), as.list(args(t.test2))[1:6])))
-output
[,1] [,2] [,3] [,4]
Difference of means -547.00000000 -586.0000000 -13.2000000 -6.690000e 02
Std Error 155.09566940 298.3532582 61.1631970 9.381009e 01
t -3.52685541 -1.9641146 -0.2158161 -7.131429e 00
p-value 0.02590082 0.1632975 0.8406800 8.798398e-03
CodePudding user response:
Another approach is modifying your existing function such that it is vectorised.
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if(!equal.variance)
{
se <- sqrt( (s1^2/n1) (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 1/n2) * ((n1-1)*s1^2 (n2-1)*s2^2)/(n1 n2-2) )
df <- n1 n2-2
}
t <- (m1-m2-m0)/se
dat <- vapply(seq_len(length(m1)),
function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
numeric(4)) #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed.
dat <- t(dat)
dat <- as.data.frame(dat)
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply
function to return a vector of length 4 for each value provided.
Under this approach, you can simply go
t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)
(or whatever you end up calling your variables)