The summarise_if
function is very helpful to summary several variables. Assume that I need the mean of every numeric variable in my dataset. I can use
df <- as_tibble(iris)
df %>% summarise_if(is.numeric, .fun = mean)
This works perfectly. But assume now that the function in .fun
involves 2 arguments from the dataset (an example is the weighet.mean
, where the weight variable is Sepal.Length). I tried,
df %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = Sepal.Length)
The error was
Error in list2(...) : object 'Sepal.Width' not found
I suspect that R did not search Sepal.Length
in df
but in it global environment. So I have to use,
df %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = df$Sepal.Length)
This works but it is not a good to do df$Sepal.Length. For example, it becomes completely impossible for me to compute the weighted mean by group.
df %>% group_by(Species) %>% summarise_if(is.numeric, .fun = function(x, w) weighted.mean(x, w), w = df$Sepal.Length)
Error: Problem with
summarise()
columnSepal.Length
. ℹSepal.Length = (function (x, w) ...
. x 'x' and 'w' must have the same length ℹ The error occurred in group 1: Species = setosa.
So, how to use summarise_if
or summarise_at
with functions involving two variables from the dataset.
CodePudding user response:
If we need to use Sepal.Length
as w
, concatenate (c
) the output from where(is.numeric)
and specify -Sepal.Length
to remove the column from across
, then use weighted.mean
on the other numeric columns, with w
as 'Sepal.Length'
library(dplyr)
df %>%
summarise(across(c(where(is.numeric), -Sepal.Length),
~ weighted.mean(., w = Sepal.Length)))
# A tibble: 1 × 3
Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl>
1 3.05 3.97 1.29
Or a grouped one would be
df %>%
group_by(Species) %>%
summarise(across(c(where(is.numeric), -Sepal.Length),
~ weighted.mean(., w = Sepal.Length)))
-output
# A tibble: 3 × 4
Species Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl>
1 setosa 3.45 1.47 0.248
2 versicolor 2.78 4.29 1.34
3 virginica 2.99 5.60 2.03
NOTE: _if
, _at
, _all
suffix functions are deprecated in favor for across