I am trying to recreate a chart that is similar to this one using ggplot2
, which categorizes historical yields (or spreads) into quartiles and presented in a stacked barchart.
I have the following dataset and code. The issue with my chart is that I think geom_bar
is summing up all of the percentage values in the Percent
column in my case as opposed to displaying it as a range.
library(tidyverse)
data <- structure(list(date = structure(c(19307, 19310, 19311, 19312,
19313, 19314, 19317, 19318, 19319, 19321, 19307, 19310, 19311,
19312, 19313, 19314, 19317, 19318, 19319, 19321), class = "Date"),
Key = c("10 Year", "10 Year", "10 Year", "10 Year", "10 Year",
"10 Year", "10 Year", "10 Year", "10 Year", "10 Year", "30 Year",
"30 Year", "30 Year", "30 Year", "30 Year", "30 Year", "30 Year",
"30 Year", "30 Year", "30 Year"), Percent = c(3.813, 3.865,
3.799, 3.692, 3.775, 3.818, 3.825, 3.758, 3.706, 3.691, 4.058,
4.058, 3.982, 3.86, 3.89, 3.927, 3.905, 3.83, 3.739, 3.751
), Quartile = structure(c(3L, 4L, 2L, 1L, 2L, 3L, 4L, 2L,
1L, 1L, 4L, 4L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 1L), levels = c("1",
"2", "3", "4"), class = "factor")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
Key = c("10 Year", "30 Year"), .rows = structure(list(1:10,
11:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
data %>% {
ggplot(., aes(x = Key, y = Percent, fill = Quartile))
geom_bar(stat = 'identity', position = position_stack())
}
A geom_boxplot
would display it as a range, however the box only covers the interquartile range as opposed to the full min and max range like in the chart that I'm trying to recreate. Also, filling the boxplot by Quartile
is much trickier.
data %>% {
ggplot(., aes(x = Key, y = Percent))
geom_boxplot()
}
Any ideas on how to go about getting this right?
CodePudding user response:
I think you may want to try using geom_rect that takes min and max values for both y and x. We can get the ymin and ymax values by grouping the data on Key and Quartile, creating min and max variables, sorting by Key and Quartile, then using slice to take just the first row of each group. We will need a numeric x value as well for this to work so we make Key a factor variable.
df <- data %>%
group_by(Key, Quartile) %>%
mutate(min = min(Percent), max = max(Percent)) %>%
arrange(Key, Quartile) %>%
slice(1) %>%
select(Key, Quartile, min, max) %>%
ungroup() %>%
mutate(Key = factor(Key))
The dataset looks like this.
Key Quartile min max
<fct> <fct> <dbl> <dbl>
1 10 Year 1 3.69 3.71
2 10 Year 2 3.76 3.80
3 10 Year 3 3.81 3.82
4 10 Year 4 3.82 3.86
5 30 Year 1 3.74 3.83
6 30 Year 2 3.86 3.90
7 30 Year 3 3.93 3.98
8 30 Year 4 4.06 4.06
Note the max of one Quartile is not the same as the min of the next Quartile so this will cause gaps in the graph. Also, you'll see 30 Year Quartile 4 has the same value for min and max so this will not show on the graph. Adjust these accordingly if you need the graphs to show continuous.
Since we turned Key into a factor, if you use as.numeric() on Key, we will get the values 1 and 2 in this example. We thus adjust the width of the rectangle by adding and subtracting a constant value. Here I used 0.25. We use scale_x_continuous to add in the labels associated with Key.
ggplot(df)
geom_rect(aes(xmin = (as.numeric(Key) -0.25), xmax = (as.numeric(Key) 0.25), ymin = min, ymax = max, fill = Quartile))
scale_x_continuous(breaks = seq(from = min(as.numeric(df$Key)), to = max(as.numeric(df$Key))), labels = unique(df$Key))
theme_bw()
theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())
I hope this has gotten close to what you are looking for.
CodePudding user response:
Try using this boxplot
.
ggplot(data)
geom_boxplot(aes(Percent, Key, col = Quartile))
coord_flip()