Why can facet_wrap() in ggplot2 be expressed with either a tilde (~) or vars()?-CodePudding

A tilde (~) in R generally denotes an anonymous function or formula, if I understand correctly. In ggplot2, you can use facet_wrap() to split your plot into facets based on a factor variable with multiple levels. There are two different ways to express this, and they both produce similar results:

# load starwars and tidyverse
library(tidyverse)
data(starwars)

With a ~:

ggplot(data = starwars, mapping = aes(x = mass))   
   geom_histogram(fill = "blue", alpha = .2)   
   theme_minimal()   
   facet_wrap( ~ gender, nrow = 1)

With vars():

ggplot(data = starwars, mapping = aes(x = mass))   
  geom_histogram(fill = "blue", alpha = .2)   
  theme_minimal()   
  facet_wrap( vars(gender), nrow = 1)

How are vars() and ~ equivalent in ggplot2? How is ~ being used in a manner that is analogous, or equivalent to, its typical usage as an anonymous function or formula in R? It doesn't seem like it's a function here? Can someone help clarify how vars() and ~ for facet_wrap() denote the same thing?

CodePudding user response：

The two plots should be identical.

In ggplot2, vars() is just a quoting function that takes inputs to be evaluated, which in this case is the variable name used to form the faceting groups. In other words, the column you supplied, usually a variable with more than one level, will be automatically quoted, then evaluated in the context of the data to form small panels of plots. I recommend using vars() inputs when you want to create a function to wrap around facet_wrap(); it’s a lot easier.

The ~, on the other hand, is syntax specific to the facet_wrap() function. For example, facet_wrap(~ variable_name) does not imply the estimation of some formulaic expression. Rather, as a one-sided formula with a variable on the right-hand side, it’s like telling R to feed the function the variable in its current form, which is just the name of the column itself. It’s confusing because we usually use the ~ to denote a relationship between $x$ and $y$. It’s kind of the same thing in this context. The missing dependent $y$ variable to the left of the ~ represents the row values, whereas the independent $x$ variable to the right of the ~ represents the column(s). Note, the function may already know the $y$ variable, which is usually specified inside of the aes() call. Layering on facet_wrap(~ ...) is just a quick way to partition those $y$ values (rows) across each dimension (level) of your $x$ variable.