I have 3 different dataset (from a longitudinal study), v1, v2 and v3. Each of them has a variable of "gender".
I'd like to plot the count of each gender from each dataset in same graph (indicated by point and connected by line), i.e. x axis will be "v1", "v2" and "v3", y axis will be the count by gender.
I know I can manually create a dataset including the values I need, but I'm wondering if there is a better way? Thank you!
The sample datasets:
a <- c("boy", "girl")
v1 <- data.frame(gender=rep(a, times=c(11,9)))
v2 <- data.frame(gender=rep(a, times=c(8,8)))
v3 <- data.frame(gender=rep(a, times=c(6,4)))
CodePudding user response:
Summarize data set:
library(tidyverse)
dd <- bind_rows(lst(v1, v2, v3), .id="dataset") %>%
count(dataset, gender)
Plot:
ggplot(dd, aes(x=dataset, y=n, colour=gender))
geom_point()
geom_line(aes(group=gender))
It's conceivable that you could do the count()
step within ggplot
in a sensible way (using stat_count()
, which is what's used internally by geom_bar()
), but this seems pretty straightforward. (If you did use stat_count()
you'd probably have to repeat it for the geom_point()
and geom_line()
geoms ... something I keep meaning to do is to write a geom_linespoints
that will draw both points and lines, with the same set of position/stats/etc.)