I have a function written to calculate the confidence interval of a ratio of averages between two vectors using jackknife standard errors
jackknife_CI = function(x, y, alpha = .05) {
xl = (sum(x,na.rm=T) - x) / (length(x) - 1)
yl = (sum(y,na.rm=T) - y) / (length(y) - 1)
n = length(x) length(y)
jack_se = (sd(c(xl / mean(y,na.rm=T), mean(x,na.rm=T) / yl),na.rm=T) * (n - 1)) / sqrt(n)
mean(x, na.rm = T) / mean(y, na.rm = T) jack_se * qnorm(c(alpha/2,1-alpha/2))
}
I want to then use it with the ToothGrowth dataset in the following way:
df1 =
ToothGrowth %>%
filter(supp == "OJ") %>%
rename(len_x = len) %>%
select(dose,len_x)
df2 =
ToothGrowth %>%
filter(supp == "VC") %>%
rename(len_y = len) %>%
select(dose, len_y)
df = cbind(df1,df2)
df = df[,-3]
jack_CI = df %>% group_by(dose) %>% jackknife_CI(x = len_x, y = len_y)
My problem is that the last line results in the error:
Error in jackknife_CI(., x = len_x, y = len_y) : object 'len_x' not found
How do I get around this?
CodePudding user response:
The last line need to be:
jack_CI = jackknife_CI(x = df$len_x, y = df$len_y)
The way you are running it is being interpreted as follows:
jack_CI = jackknife_CI(group_by(df, dose), x = len_x, y = len_y)
Which is causing a couple issues:
jackknife_CI
is not expecting the first argument to be the dataframe. (because of pipe operator)len_x
andlen_y
are not recognized outside of the dataframe.
If you want to run the function on each group you can do:
df %>% group_by(dose) %>%
do({
ci <- jackknife_CI(.$len_x, .$len_y)
tibble(low = ci[1], hi = ci[2])
})
I use do
because the function returns two values. Otherwise you would be able to just use summarize
. Each group is being passed to do
which is then returning a tibble (note the last line in do
) which are then being stacked to return the result. I am referring to each group inside of do
with .$variable_name
where the dot references the value being passed (in this case the dataframe for each group)