R apply() custom function to every row in data frame-CodePudding

I'm trying to apply a custom function row-by-row throughout a data frame. The function is a t-test using summary statistics reported in the data frame, and I want to append the existing data frame with the four output columns of the function (Difference of means, Std Error, t, p-value). The function is working fine, but I am not sure where/how to call the variables of the function to the columns in the data frame.

Here is a sample of the data frame:

MPAmeans <- data.frame(
  group = c("ccfrp","kelp","beach","intertidal"),
   MPA_mean = c(935,974,50.8,935),
   reference_mean = c(388,388,37.6,266),
   sd_MPA = c(208, 488, 85.9, 60),
   sd_reference = c(170, 170, 62, 151),
   n_MPA = (3,3,3,3),
   n_reference = c(3,3,3,3)

And the function I want to apply to every row...

t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
  if( equal.variance==FALSE ) 
  {
    se <- sqrt( (s1^2/n1)   (s2^2/n2) )
    # welch-satterthwaite df
    df <- ( (s1^2/n1   s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1)   (s2^2/n2)^2/(n2-1) )
  } else
  {
    # pooled standard deviation, scaled by the sample sizes
    se <- sqrt( (1/n1   1/n2) * ((n1-1)*s1^2   (n2-1)*s2^2)/(n1 n2-2) ) 
    df <- n1 n2-2
  }      
  t <- (m1-m2-m0)/se 
  dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))  #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed. 
  names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
  return(dat) 
}

I'm using apply(), and have listed the function reference to the data frame below...

apply(MPAmeans,
      1,
      t.test2)


m1=reference_mean,
m2=MPA_mean,
s1=sd_reference,
s2=sd_MPA,
n1=n_reference,
n2=n_MPA

How do I reference/call the function variables in apply() and then append the four new columns to the data frame?

CodePudding user response：

We can either specify the arguments one by one with a lambda function (function(x)) after looping or create a named list and use do.call

apply(MPAmeans[c(3, 2, 5, 4, 7, 6)], 1, function(x) 
   do.call(t.test2, setNames(as.list(x), as.list(args(t.test2))[1:6])))

-output

                          [,1]         [,2]        [,3]          [,4]
Difference of means -547.00000000 -586.0000000 -13.2000000 -6.690000e 02
Std Error            155.09566940  298.3532582  61.1631970  9.381009e 01
t                     -3.52685541   -1.9641146  -0.2158161 -7.131429e 00
p-value                0.02590082    0.1632975   0.8406800  8.798398e-03

CodePudding user response：

Another approach is modifying your existing function such that it is vectorised.

    t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
    if(!equal.variance) 
    {
        se <- sqrt( (s1^2/n1)   (s2^2/n2) )
        # welch-satterthwaite df
        df <- ( (s1^2/n1   s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1)   (s2^2/n2)^2/(n2-1) )
    } else
    {
        # pooled standard deviation, scaled by the sample sizes
        se <- sqrt( (1/n1   1/n2) * ((n1-1)*s1^2   (n2-1)*s2^2)/(n1 n2-2) ) 
        df <- n1 n2-2
    }      
    t <- (m1-m2-m0)/se 
    dat <- vapply(seq_len(length(m1)), 
                  function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
                  numeric(4))  #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed. 
    dat <- t(dat)
    dat <- as.data.frame(dat)
    names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
    return(dat) 
}

This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply function to return a vector of length 4 for each value provided.

Under this approach, you can simply go

t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)

(or whatever you end up calling your variables)