I have the following function:
estimate = function(df, y_true) {
R = nrow(df)
y_estimated = apply(df, 2, mean)
((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}
df = iris[1:10,2:4]
y_true = c(3, 1, 0.4)
estimate(df = df, y_true = y_true)
user:bird provided this and works great, however, I also need to find the means by group. So if we change the df to df= iris[,2:5]
, how to do I find the means of each column by Species to use in the function. I figured something like this would work- but not luck:
estimate = function(df, y_true, group) {
R = nrow(df)
y_estimated = df %>% group_by(group) %>% apply(df, 2, mean)
((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}
df = iris[2:5]
y_true = c(3, 1, 0.4)
group=df$Species
estimate(df = df, y_true = y_true, group=group)
Using colMeans
also did not work.
This is an extension of this post which explains the purpose of each variable.
CodePudding user response:
Rather than modifying your function, you can keep the function as-is and apply it group-wise to your data. If you use group_by
and then group_modify
, the input to the function you pass to group_modify
is the data frame, subset to the rows in that specific group.
estimate = function(df, y_true) {
R = nrow(df)
y_estimated = apply(df, 2, mean)
((sqrt( (y_estimated - y_true)^2 / R)) / y_true) * 100
}
df = iris[2:5]
y_true = c(3, 1, 0.4)
library(dplyr, warn.conflicts = FALSE)
df %>%
group_by(Species) %>%
group_modify(~ as.data.frame.list(estimate(., y_true)))
#> # A tibble: 3 × 4
#> # Groups: Species [3]
#> Species Sepal.Width Petal.Length Petal.Width
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 2.02 6.53 5.44
#> 2 versicolor 1.08 46.1 32.7
#> 3 virginica 0.123 64.4 57.5
Created on 2022-02-24 by the reprex package (v2.0.1)