A tilde (~
) in R generally denotes an anonymous function or formula, if I understand correctly. In ggplot2
, you can use facet_wrap()
to split your plot into facets based on a factor variable with multiple levels. There are two different ways to express this, and they both produce similar results:
# load starwars and tidyverse
library(tidyverse)
data(starwars)
With a ~
:
ggplot(data = starwars, mapping = aes(x = mass))
geom_histogram(fill = "blue", alpha = .2)
theme_minimal()
facet_wrap( ~ gender, nrow = 1)
With vars()
:
ggplot(data = starwars, mapping = aes(x = mass))
geom_histogram(fill = "blue", alpha = .2)
theme_minimal()
facet_wrap( vars(gender), nrow = 1)
How are vars()
and ~
equivalent in ggplot2
? How is ~
being used in a manner that is analogous, or equivalent to, its typical usage as an anonymous function or formula in R? It doesn't seem like it's a function here? Can someone help clarify how vars()
and ~
for facet_wrap()
denote the same thing?
CodePudding user response:
The two plots should be identical.
In ggplot2
, vars()
is just a quoting function that takes inputs to be evaluated, which in this case is the variable name used to form the faceting groups. In other words, the column you supplied, usually a variable with more than one level, will be automatically quoted, then evaluated in the context of the data to form small panels of plots. I recommend using vars()
inputs when you want to create a function to wrap around facet_wrap()
; it’s a lot easier.
The ~
, on the other hand, is syntax specific to the facet_wrap()
function. For example, facet_wrap(~ variable_name
) does not imply the estimation of some formulaic expression. Rather, as a one-sided formula with a variable on the right-hand side, it’s like telling R to feed the function the variable in its current form, which is just the name of the column itself. It’s confusing because we usually use the ~
to denote a relationship between $x$ and $y$. It’s kind of the same thing in this context. The missing dependent $y$ variable to the left of the ~
represents the row values, whereas the independent $x$ variable to the right of the ~
represents the column(s). Note, the function may already know the $y$ variable, which is usually specified inside of the aes()
call. Layering on facet_wrap(~ ...)
is just a quick way to partition those $y$ values (rows) across each dimension (level) of your $x$ variable.