Sorry, I know that this question was practically asked already here, however I still don't understand how these summary functions( fun.min, fun.max, fun) truly work. In the manual examples, these functions are oftentimes in the context of and defined as
ggplot(data = <DATA>)
stat_summary(
mapping = aes(<MAPPINGS>),
fun.min = min,
fun.max = max,
fun = median
)
From what I understand, fun.min = min
defines the summary function as the min
base R function, however then what happens? How does fun.min
know to find the minimum value of our mappings argument? Taking a step back, why does fun.min
have to even be defined as the min
base R function, since isn't it already defined as finding the minimum value in a vector and then returning a single number?
CodePudding user response:
Let's start with a simple example:
library(ggplot2)
base <- ggplot(data = mtcars, mapping = aes(cyl, mpg))
geom_point()
base stat_summary(
fun.min = min,
fun.max = max,
fun = median,
color = "red"
)
Now to your questions:
From what I understand,
fun.min = min
defines the summary function as themin
base R function, however then what happens? How doesfun.min
know to find the minimum value of our mappings argument?
Under the hood stat_summary
aggregates the dataset passed via the data
argument, i.e. for each value or category of the x
variable (or in the more general case the interaction of the x
and the group
aesthetic) it computes the three statistics defined by fun
, fun.min
and fun.max
for the variable mapped on the y
aes. The value computed by fun
is then mapped on y
, the value of fun.min
on ymin
and the value of fun.max
on ymax
.
The result of this step could be made visible via ggplot2::layer_data
. For the basic example
layer_data(last_plot(), 2) # The 2 means we want the data for the second layer. geom_point = first layer, stat_summary = second layer
#> x group y ymin ymax PANEL flipped_aes colour size linetype shape fill
#> 1 4 -1 26.0 21.4 33.9 1 FALSE red 0.5 1 19 NA
#> 2 6 -1 19.7 17.8 21.4 1 FALSE red 0.5 1 19 NA
#> 3 8 -1 15.2 10.4 19.2 1 FALSE red 0.5 1 19 NA
#> alpha stroke
#> 1 NA 1
#> 2 NA 1
#> 3 NA 1
Besides a lot of other stuff added by ggplot2
that's basically the same as
library(dplyr, warn = FALSE)
mtcars |>
group_by(x = cyl) |>
summarise(y = median(mpg), ymin = min(mpg), ymax = max(mpg))
#> # A tibble: 3 × 4
#> x y ymin ymax
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 26 21.4 33.9
#> 2 6 19.7 17.8 21.4
#> 3 8 15.2 10.4 19.2
Taking a step back, why does
fun.min
have to even be defined as themin
base R function, since isn't it already defined as finding the minimum value in a vector and then returning a single number?
The answer is no. fun.min
is not defined as the minimum or min
function. It's NULL
by default (see ?stat_summary
). fun.min
provides the function which is used to compute the summary statistic to be mapped on the ymin
aes. If fun.min
would be fixed to the min
imum, stat_summary
wouldn't be that useful, e.g. quite often we use fun.min
and fun.max
to display a confidence interval or to display the interquartile range:
base stat_summary(
fun.min = ~quantile(.x, probs = .25),
fun.max = ~quantile(.x, probs = .75),
fun = median,
color = "red"
)
CodePudding user response:
What do fun.min
/fun.max
/fun
do?
These functions manipulate different parts of the ggplot
output. They take a vector of values and produce a single value that is then placed on the figure.
fun.min
assigns a value to the lower bounds of an interval. Usually, this is the minimum of they
value.fun.max
assigns a value to the upper bounds of the interval. Usually this is the maximum of they
value.fun
assigns a value to the dot that is often in the middle of the interval. Usually this is the mean of they
value.
You can see this behavior by observing how the plots below vary with the functions passed to these arguments. It helps to think about these as controlling the interval - which can represent whatever you want - rather than showing the max, min, and mean. For example, instead of displaying the minimum, maximum, and mean values with the interval, you could show an interval that calculate the 1st, 2nd, and 3rd tertile of values for the vector by passing functions that estimate each of these quantities. Below, I demonstrate passing different functions.
A plot with the "typical" functions: min
, max
, mean
set.seed(13)
library(ggplot2)
# An example ggplot
d <- ggplot(mtcars, aes(cyl, mpg)) geom_point()
## Default (fun.min = min)
d stat_summary(fun = mean, fun.min = min, fun.max = max, colour = "red")
Pass fun.min
a function that returns 1
d stat_summary(fun = mean, fun.min = function(x){1}, fun.max = max, colour = "red")
Pass fun.min
a function that returns a random value from c(-20, 0, 20)
d stat_summary(fun = mean,
fun.min = function(x){sample(c(-20, 0, 20), size = 1, replace = T)}, fun.max = max, colour = "red")
Pass fun
a function that returns 1
See the dot move compared to the first plot!
d stat_summary(fun = function(x){1}, fun.min = min, fun.max = max, colour = "red")
How does it "know" our mappings?
The documentation notes that it can be ambiguous what vector is actually being passed to these arguments. By default, ggplot
tries to figure out the orientation of the figure (what variables are on the x versus y axes) and then passes the y-axis values. However, you can set the orientation manually to avoid producing unexpected results.
Here is the description of the orientation
argument to stat_summary
:
The orientation of the layer. The default (NA) automatically determines the orientation from the aesthetic mapping. In the rare event that this fails it can be given explicitly by setting orientation to either "x" or "y". See the Orientation section for more detail.
Alternative geom
types
While changing the parameters affect the interval in the above examples, it is possible to change the geom
type within stat_summary
. Instead of layering an interval defined by 3 points on top of the plot, we could add a line instead. You'll see below that the fun.min
argument now changes the scale of the axis but not the actual line's values. Nevertheless, fun
itself affects the line's position.
geom = "line"
: typical functions
d stat_summary(fun = mean, fun.min = min, fun.max = max, colour = "red", geom = "line")
geom = "line"
: pass fun.min
a function that returns 5
d stat_summary(fun = mean, fun.min = function(x){5}, fun.max = max, colour = "red", geom = "line")
geom = "line"
: pass fun
a function that returns 5
d stat_summary(fun = function(x){5}, fun.min = min, fun.max = max, colour = "red", geom = "line")