Home > Software design >  how to use geom_pointrange to plot data in descending order?
how to use geom_pointrange to plot data in descending order?

Time:03-06

I have been asked to do this, "make a dotplot of the average age for each combination of sex, passengerClass and survival. Use geom_pointrange, order the dots by decreasing age and make sure the labels are on the y-axis.". This is my try and the plot it produce doesn't seem to be right.

# the dataset given to me
TitanicSurvival %>% group_by(survived,passengerClass,sex) %>% 
  summarize(age=mean(age, na.rm=T)) %>%
  unite(survived, passengerClass, sex, col="group", remove = FALSE) -> age
glimpse(age)

# my attempt to plot
age %>% 
    unite(sex, passengerClass, survived, col="group", remove = FALSE) %>%
    ggplot(aes(x = age, y= group))   
    geom_dotplot(binwidth = 1)  
    geom_pointrange(aes(xmin = sort(age), xmax = sort(age)), orientation = "y")

enter image description here

CodePudding user response:

This is generally easier to reproduce if you provide (a representative portion) of the actual data, e.g. the output of dput(head(TitanicSurvival, 100)). But as this is a fairly popular dataset, I had it already available.

The exact wording of your task may have led to some confusion here - geom_dotplot is essentially a variant of a histogram, and I believe the intention was to ask about plotting the datapoints (usually geom_point). Specifying the geom_pointrange function indicates that you should also show the range, but it's not actually asked for.

So, here is my take on it. First we prepare all statistics and labels we may want to plot. The trickiest step is the last one - setting the factor levels of the labels based on the order of the average age, so they will show up in that order on the y-axis:

df_age <- TitanicSurvival %>% 
  group_by(survived, passengerClass, sex) %>% 
  dplyr::summarize(
    age_min = min(age, na.rm = TRUE),
    age_mean = mean(age, na.rm=TRUE),
    age_max = max(age, na.rm = TRUE),
    .groups = "drop"
    ) %>%
  dplyr::mutate(
    group_label = glue::glue("{sex}_{passengerClass}_{survived}"),
  ) %>%
  dplyr::arrange(age_mean) %>%
  dplyr::mutate(
    group_label = forcats::as_factor(group_label)
  )
# A tibble: 12 x 7
#   survived passengerClass sex    age_min age_mean age_max group_label   
#   <chr>    <chr>          <chr>    <dbl>    <dbl>   <dbl> <fct>         
# 1 yes      2nd            male     0.667     17.4      62 male_2nd_yes  
# 2 yes      3rd            female   0.167     20.8      63 female_3rd_yes
# 3 yes      3rd            male     0.417     22.4      45 male_3rd_yes  
# 4 no       3rd            female   1         23.4      48 female_3rd_no 

Then we can plot the points using geom_pointrange (we already took care of the correct ordering when we set the factor levels!):

ggplot(df_age, aes(x = age_mean, y = group_label))   
  geom_pointrange(aes(xmin = age_min, xmax = age_max))  
  theme_bw()  
  labs(x = "Age / Years", y = "")

enter image description here

  • Related