I want to plot only a subset of my data, dependent on another column. I want to do this within ggplot, not by subsetting my data. As a simple example:
ggplot(mtcars, aes(x=hp, y=mpg)
geom_point()
How would I get geom_point to only plot points with cyl == 4?
In my real data it’d be dependent on the value of another column being TRUE
CodePudding user response:
You can use the filter function from the dplyr package and then pipe the result directly into the ggplot function.
library(dplyr)
mtcars %>% filter(cyl==4) %>%
ggplot( aes(x=hp, y=mpg))
geom_point()
Edit:
For your more complicated question of using different subsets of the same data, one will need to used the "data" option within the Geoms calls to redefine the dataset.
library(dplyr)
mtcars %>% filter(cyl==4) %>%
ggplot( aes(x=hp, y=mpg))
geom_point(color="red")
geom_line(aes(x=hp, y=mpg), data = filter(mtcars, cyl==6))
CodePudding user response:
Looks like this can be done with geom_point(data = . %>% filter(newcol>4), color="red")
CodePudding user response:
In this case I think it's best to do the filter
ing inside the individual geom layers since they're all different subsets of the same data source. In order to prevent the output of %>%
going in as the first argument you need to embrace the ggplot()
call in curly braces {}
and then also wrap the pipe output in curly braces, like this: {.}
.
This somewhat unintuitive behaviour of the {magrittr} pipe is lightly documented here.
You can combine multiple conditions in the filter
operation by connecting then with logical OR (|
) or AND (&
) operators.
library(tidyverse)
mtcars %>%
mutate(newcol = cyl * wt) %>%
rownames_to_column("car") %>%
{
ggplot()
geom_point(data = {.} %>% filter(cyl > 4 | qsec < 17),
aes(x = hp, y = mpg))
geom_text(data = {.} %>% filter(newcol < 10 | disp < 90),
aes(x = hp, y = mpg, label = car))
}
Created on 2022-02-19 by the reprex package (v2.0.1)