R ggplot2 - Understanding the parameters of the aes function-CodePudding

I am learning R and ggplot2 but one thing that really confuses me is the parameters available to the aes function.

I am proficient in programming languages such as Python and Java. In such programming languages you define a function, and its parameters are also pre-defined and you expect so many parameters that a function can take.

But the use of the aes function seems to be very different here, except its 'x' and 'y' parameters. For example:

 ggplot(forestarea, aes(income)) geom_bar(aes(fill=region))    labs(x="Regions", y="Number of countries",
       title="Number of countries by income level from each region in the world",
       caption="The WDI Forest Area Indicator")

In the above code, in the second aes function, the 'fill' parameter seems to be associated with the 'geom_bar' function. Is it actually a parameter of geom_bar?

Then:

ggplot(forestarea, aes(factor(1), fill= income)) geom_bar()     coord_polar(theta="y")    theme(axis.line = element_blank(), panel.background = element_blank())     theme(axis.text = element_blank())     theme(axis.ticks = element_blank())     labs(x=NULL, y=NULL, fill="Income level",
       title="Proportion of countries by income level",
       caption="WDI Forest Area Indicator")

This code creates a pie chart, but you can see the 'fill' parameter is inside the aes function that is outside the geom_bar function, I am confused. Is it a parameter of aes or not?

Then:

ggplot(land_and_agrpc, aes(area = AG.LND.FRST.K2, fill = AG.LND.AGRI.ZS, label=country))  
  geom_treemap()   geom_treemap_text()  
  labs(title="Countries by land area",
       fill="% of agriculture land",
       caption="WDI country land area and forest land percentage datasets")

This code is used to create a treemap, and you can see the aes function takes the 'area' parameter, which is explained in the documentation for treemap: https://cran.r-project.org/web/packages/treemapify/vignettes/introduction-to-treemapify.html. I am even more confused.

So, how do I interpret the parameters of the aes function, where do I use them (inside 'ggplot', or the 'geom_XXX' function)?

CodePudding user response：

Consider the code chunk below:

library(ggplot2)

df <- data.frame(
  x = c(1, 2), y = c(2, 1)
)

ggplot(df, aes(x, y   1))  
  geom_point(colour = "green")  
  geom_line(aes(colour = "blue"))

Here, the aes(x, y 1) means aes(x = x, y = y 1) which sets the x and y aesthetics that some layers understand to the x and y columns of the dataframe. This is because aes() has three arguments, x, y and .... By not declaring x = x for example, the first variable x is matched to the x parameter through the position in the function call. Other parameters than x or y must be named, for example aes(size = 10) and get passed trough ... to become part of the mapping (which are name-expression pairs).

Because the expression y = y 1 is evaluated using 'non standard evaluation' in aes(), the scoping rules change and the variable y will first be attempted to be evaluated in the context of the data columns and not in the global environment, and hence we can 'calculate' the 1 on the dataframe columns.

It's not the aes() function that determines what are valid argument = value mappings, it is the layers that accept or reject parameters. You can find the parameters a layer accepts in the documentation of the layer, for example in ?geom_point, you see that it understands x, y, alpha, colour, fill, group, shape, size and stroke. You should be able to find these back if you call your_geom_layer$geom$aesthetics(). Extension packages can define their own layers with their own aesthetics, such as the area in the {treemap} package.

Additionally, because we've defined aes(x, y 1) in the main ggplot() call, it will applied to every geometry or stat layer in that plot, in this case the points and the line. Hence, we do not need to repeat the same mapping in every layer but it is inherited unless you set inherit.aes = FALSE in a layer.

In the point layer we've defined colour = "green" outside the aes() function, so it will be interpreted literally (and follows standard evaluation with the usual scoping rules). People also call this a 'static' mapping, and you can only use this in layers and not globally. In contrast, because we've defined aes(colour = "blue") in the line layer, the "blue" will be interpreted as a categorical variable that participates in a colour scale that has it's own palette (a 'dynamic' mapping). If you execute the code, you'll see that the line is not blue, but a salmon-ish colour with a legend that maps the categorical value "blue" to a discrete scale with a 1-colour palette. Because "blue" is not a column in the dataframe, nor a variable in the global environment, it will be interpreted as a length 1 vector that will be recycled to fit the number of rows in the dataframe.

In general, if you want to map something to a scale (including position scales such as x and y), you put it inside aes(). If you want to have a literal interpretation, you put it outside aes() at the relevant layer.