Home > Blockchain >  ggplot same variable from different dataset
ggplot same variable from different dataset

Time:10-07

I have 3 different dataset (from a longitudinal study), v1, v2 and v3. Each of them has a variable of "gender".

I'd like to plot the count of each gender from each dataset in same graph (indicated by point and connected by line), i.e. x axis will be "v1", "v2" and "v3", y axis will be the count by gender.

I know I can manually create a dataset including the values I need, but I'm wondering if there is a better way? Thank you!

The sample datasets:

a <- c("boy", "girl")
v1 <- data.frame(gender=rep(a, times=c(11,9)))
v2 <- data.frame(gender=rep(a, times=c(8,8)))
v3 <- data.frame(gender=rep(a, times=c(6,4)))

CodePudding user response:

Summarize data set:

library(tidyverse)
dd <- bind_rows(lst(v1, v2, v3), .id="dataset") %>%
    count(dataset, gender)

Plot:

ggplot(dd, aes(x=dataset, y=n, colour=gender))   
   geom_point()   
   geom_line(aes(group=gender))

It's conceivable that you could do the count() step within ggplot in a sensible way (using stat_count(), which is what's used internally by geom_bar()), but this seems pretty straightforward. (If you did use stat_count() you'd probably have to repeat it for the geom_point() and geom_line() geoms ... something I keep meaning to do is to write a geom_linespoints that will draw both points and lines, with the same set of position/stats/etc.)

  • Related