I need to create the blue tick line on R. Basically I have information on the values in red and in yellow for different companies. I need to write a code that does: whatever the sum of yellow and red was in 2022 for company x, the blue line of company x should start at that value, and then increase 2% to 3% every year (which is why it is so tick).Can anyone help please? An example would be a company that in 2022 has 20 for red, and 10 for yellow. In this case, the blue line should start at 30 in 2022, 30* 1,02 or 1,03 in 2023, etc. Sorry for adding no database but I am not sure if I can legally share it. So anyone would be so kind to help, would be great.
Example:
A 2022 100 Red
A 2022 20 Yellow
A 2022 120 (100 20) Blue
A 2023 100 Red
A 2023 50 Yellow
A 2023 120 * 1,02 Blue
A 2024 130 Red
A 2024 55 Yellow
A 2024 120 * 1,02 * 1,02 Blue
By every year that goes by, the original sum of yellow and red in 2022 increases by 2%
CodePudding user response:
If d
is your data (say, with multiple companies), then getting the blue data for each company is straightforward
inner_join(
d, d %>%
filter(y==2022) %>%
mutate(blue=red yellow) %>%
select(company, blue),
by="company"
) %>%
mutate(blue = blue*(1.02^(y-2022)))
# A tibble: 6 x 5
company y red yellow blue
<chr> <int> <dbl> <dbl> <dbl>
1 A 2022 100 20 120
2 A 2023 100 50 122.
3 A 2024 130 55 125.
4 B 2022 200 15 215
5 B 2023 205 18 219.
6 B 2024 210 30 224.
Input Data:
d = tibble(
company=c(rep("A",3),rep("B",3)),
y = rep(2022:2024,2),
red= c(100, 100, 130, 200, 205, 210),
yellow = c(20, 50,55, 15,18, 30)
)
Here is an extended example for plotting, using fake data that tries to approximate the plot in the original post. If d
is your data for red and yellow (see below for structure/generation of d
), then you can:
- Create the blue data
base = colSums(d[d$year==2022,2])
blue_data = tibble(year=2022:2040) %>%
mutate(value=base*(1.02^(year-2022)),
group="blue",
plotmax = base*(1.03^(year-2022)),
plotmin = value)
- Manipulate
d
andblue_data
to create plot data
# create plot data
plot_data <- rbind(
d %>% mutate(
plotmax = c(
d %>% filter(group=="red") %>% pull(value),
d %>% filter(year>=2021) %>% group_by(year) %>% summarize(value=sum(value)) %>% pull(value)),
plotmin = c(
rep(0,times=nrow(d %>% filter(group=="red"))),
d %>% filter(year>=2021, group=="red") %>% pull(value)
),
blue_data
)
- Plot the data in
plot_data
manual_values = c("red"="red", "blue"="blue", "yellow"="yellow")
ggplot(plot_data, aes(x=year, color=group,fill=group))
geom_ribbon(aes(ymin=plotmin, ymax=plotmax))
scale_color_manual(values=manual_values)
scale_fill_manual(values=manual_values)
theme(legend.position='bottom')
Input data (d
)
structure(list(year = c(2015L, 2016L, 2017L, 2018L, 2019L, 2020L,
2021L, 2022L, 2023L, 2024L, 2025L, 2026L, 2027L, 2028L, 2029L,
2030L, 2031L, 2032L, 2033L, 2034L, 2035L, 2036L, 2037L, 2038L,
2039L, 2040L, 2021L, 2022L, 2023L, 2024L, 2025L, 2026L, 2027L,
2028L, 2029L, 2030L, 2031L, 2032L, 2033L, 2034L, 2035L, 2036L,
2037L, 2038L, 2039L, 2040L), value = c(220, 242.857142857143,
265.714285714286, 288.571428571429, 311.428571428571, 334.285714285714,
357.142857142857, 380, 360, 352.941176470588, 345.882352941176,
338.823529411765, 331.764705882353, 324.705882352941, 317.647058823529,
310.588235294118, 303.529411764706, 296.470588235294, 289.411764705882,
282.352941176471, 275.294117647059, 268.235294117647, 261.176470588235,
254.117647058824, 247.058823529412, 240, 0, 5, 10, 15, 20, 25,
30, 35, 40, 45, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50), group = c("red",
"red", "red", "red", "red", "red", "red", "red", "red", "red",
"red", "red", "red", "red", "red", "red", "red", "red", "red",
"red", "red", "red", "red", "red", "red", "red", "yellow", "yellow",
"yellow", "yellow", "yellow", "yellow", "yellow", "yellow", "yellow",
"yellow", "yellow", "yellow", "yellow", "yellow", "yellow", "yellow",
"yellow", "yellow", "yellow", "yellow")), row.names = c(NA, -46L
), class = c("tbl_df", "tbl", "data.frame"))