I have a data frame with 3 columns I want to display using geom_raster where the x-axis, y-axis, and fill all contain categorial values. The goal with my original dataset is to show the overlapping data with fills and to show where no data is held for the x-axis. For some combinations of the x and y, multiple fills may apply, however, the geom_raster is only filling the cell on the last occurrence of the fill. Instead, I would like to stack all fills similar to a percent stacked barplot where if only one fill applies, then fill the entire cell with that color. However, if 3 fill values apply, then fill its stack equally among all three colors i.e. 33% each.
For my example dataset, I simulated 2 triathlon races between 26 people where they were assigned a speed of first, last, or second (156 individual events = 26 people x 3 events x 2 races). I then randomly assigned non-finishes by assigning NA values to 26 random places. I then create a raster that shows each person's place for each event for the combined races and it colors based on if they finished 'first', 'second', or 'last. This also results in empty (white) cells if they did not place for the event. With the x axis as the person and the y axis as the 3 events in a triathlon, this creates a raster with 78 cells. My goal for this minimum example, is to stack the fills of each cell based on their finishes for each race. Since this dataset contains 2 triathalon races, there are 2 possible fills for each cell. If a person places the same for a given event, then that should be the same color. If they place differently then the cell should be divided in 2 with one color for each place. Currently, the code fills the entire cell based on the last occurring value (e.g. their place for the second race). For example person 'A' placed second in biking for the first race and did not finish in the second race, placed first in run for the first race and last for the second race, and placed second in swimming for the first race and did not place in the last race. However, the raster is showing that person A did not finish for swimming, placed last for running, and did not finish for biking. How can I stack fills for cells in a geom_raster?
An important part I want in the code is to make sure the size of the cell is all the same. So if someone competed in 10 races they would have 10 possible fills for their cell, while someone competeing in 3 will only have 3. But the overall size of their cell should be the same. Or for example with the above dataset, if we removed the places were they did not finish, df %>% drop_na(), the cells with 2 values should be the same size as the cells with 1 value.
Reproducible Code
#num races
num_races = 2
#create fake data
#make the random speed below reproducible
set.seed(001)
df <- data.frame(
person = rep(LETTERS,3*num_races),
event = rep(c('swim','run','bike'),26*num_races),
place = sample(rep(c('first','second','last'),26*num_races)))
#set some random did not finish
set.seed(001)
df$place[sample(1:156,26)] <- NA
#Person A had multiple speeds for running, raster colors by the final occurng place
df %>%
filter(person == 'A') %>%
arrange(event)
#create the raster
df %>%
ggplot(aes(x=person, y=event))
geom_raster(aes(fill=place))
scale_x_discrete(position="top")
scale_y_discrete(expand = c(0,0))
geom_hline(yintercept = c(1:2 0.5), size=0.7)
scale_fill_manual(values = c("first"="green","second"="blue",
"last"="red"),
na.value = "white")
labs(x="Person", y="Event")
theme_classic()
theme(axis.text.y=element_text(size=9),
axis.title = element_text(size=18),
panel.border = element_rect(fill=NA, size=0.7))
Modified Graph in Photoshop to show what I want for Person A
But do this programmatically for all persons. I lowered opacity for everything else to focus in on person A which I manually modified.
Similiar Question