I have been asked to do this, "make a dotplot of the average age for each combination of sex, passengerClass and survival. Use geom_pointrange, order the dots by decreasing age and make sure the labels are on the y-axis.". This is my try and the plot it produce doesn't seem to be right.
# the dataset given to me
TitanicSurvival %>% group_by(survived,passengerClass,sex) %>%
summarize(age=mean(age, na.rm=T)) %>%
unite(survived, passengerClass, sex, col="group", remove = FALSE) -> age
glimpse(age)
# my attempt to plot
age %>%
unite(sex, passengerClass, survived, col="group", remove = FALSE) %>%
ggplot(aes(x = age, y= group))
geom_dotplot(binwidth = 1)
geom_pointrange(aes(xmin = sort(age), xmax = sort(age)), orientation = "y")
CodePudding user response:
This is generally easier to reproduce if you provide (a representative portion) of the actual data, e.g. the output of dput(head(TitanicSurvival, 100))
. But as this is a fairly popular dataset, I had it already available.
The exact wording of your task may have led to some confusion here - geom_dotplot
is essentially a variant of a histogram, and I believe the intention was to ask about plotting the datapoints (usually geom_point
). Specifying the geom_pointrange
function indicates that you should also show the range, but it's not actually asked for.
So, here is my take on it. First we prepare all statistics and labels we may want to plot. The trickiest step is the last one - setting the factor levels of the labels based on the order of the average age, so they will show up in that order on the y-axis:
df_age <- TitanicSurvival %>%
group_by(survived, passengerClass, sex) %>%
dplyr::summarize(
age_min = min(age, na.rm = TRUE),
age_mean = mean(age, na.rm=TRUE),
age_max = max(age, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::mutate(
group_label = glue::glue("{sex}_{passengerClass}_{survived}"),
) %>%
dplyr::arrange(age_mean) %>%
dplyr::mutate(
group_label = forcats::as_factor(group_label)
)
# A tibble: 12 x 7
# survived passengerClass sex age_min age_mean age_max group_label
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct>
# 1 yes 2nd male 0.667 17.4 62 male_2nd_yes
# 2 yes 3rd female 0.167 20.8 63 female_3rd_yes
# 3 yes 3rd male 0.417 22.4 45 male_3rd_yes
# 4 no 3rd female 1 23.4 48 female_3rd_no
Then we can plot the points using geom_pointrange
(we already took care of the correct ordering when we set the factor levels!):
ggplot(df_age, aes(x = age_mean, y = group_label))
geom_pointrange(aes(xmin = age_min, xmax = age_max))
theme_bw()
labs(x = "Age / Years", y = "")