Home > other >  Using group_by in function
Using group_by in function

Time:10-21

I have a function written to calculate the confidence interval of a ratio of averages between two vectors using jackknife standard errors


jackknife_CI = function(x, y, alpha = .05) {

 xl = (sum(x,na.rm=T) - x) / (length(x) - 1)
 yl = (sum(y,na.rm=T) - y) / (length(y) - 1)
 n = length(x)   length(y)
 
 jack_se = (sd(c(xl / mean(y,na.rm=T), mean(x,na.rm=T) / yl),na.rm=T) * (n - 1)) / sqrt(n)

 mean(x, na.rm = T) / mean(y, na.rm = T)   jack_se * qnorm(c(alpha/2,1-alpha/2))
}

I want to then use it with the ToothGrowth dataset in the following way:


df1 =
  ToothGrowth %>%
  filter(supp == "OJ") %>% 
  rename(len_x = len) %>% 
  select(dose,len_x)

df2 =
  ToothGrowth %>%
  filter(supp == "VC") %>% 
  rename(len_y = len) %>% 
  select(dose, len_y)

df = cbind(df1,df2)
df = df[,-3]
jack_CI = df %>% group_by(dose) %>% jackknife_CI(x = len_x, y = len_y)

My problem is that the last line results in the error:

Error in jackknife_CI(., x = len_x, y = len_y) : object 'len_x' not found

How do I get around this?

CodePudding user response:

The last line need to be:

jack_CI = jackknife_CI(x = df$len_x, y = df$len_y)

The way you are running it is being interpreted as follows:

jack_CI = jackknife_CI(group_by(df, dose), x = len_x, y = len_y)

Which is causing a couple issues:

  • jackknife_CI is not expecting the first argument to be the dataframe. (because of pipe operator)
  • len_x and len_y are not recognized outside of the dataframe.

If you want to run the function on each group you can do:

df %>% group_by(dose) %>% 
  do({
    ci <- jackknife_CI(.$len_x, .$len_y)
    tibble(low = ci[1], hi = ci[2])
  })

I use do because the function returns two values. Otherwise you would be able to just use summarize. Each group is being passed to do which is then returning a tibble (note the last line in do) which are then being stacked to return the result. I am referring to each group inside of do with .$variable_name where the dot references the value being passed (in this case the dataframe for each group)

  • Related