I'm reading multiple files into a dataframe each. E.g. files for 'apples', 'oranges' and 'lemons'. Each of these files (=dataframes) contains data like this:
Def Q1-2021 Q2-2021 Q3-2021 ID
1 price 15 17 9 apples
2 stock 1270 3575 175 apples
3 revenue 57866 66287 21755 apples
4 profit 22156 34187 9253 apples
There are several more quarters of data, but you get the idea.
Now I'd like to make a graph that displays the price as a line chart for all dataframes (ie for apples, oranges, lemons, etc.).
I was considering rbind-ing all dataframes together, but since not all products have data for each quarter, that won't work.
Now I'm stuck and don't know how to get any further. Can someone help me out please?
CodePudding user response:
If you convert your data into "long" format, then it will be easy to rbind
and plot as desired. I just made up some data for oranges to help illustrate.
First I would put each data.frame
into a list
and then you can apply()
/purrr::map()
the manipulations on all at the same time and then combine.
It will probably help to reformat your date information also to make it plot better so you'll see I put year before quarter, but there are many other good options here depending on the structure of your actual data and ultimate goals.
Finally, you mentioned only plotting price, but I'm showing here how to plot all the different value types in case that was of interest. If not, you can just un-comment the line that filter()
s just to price
.
library(tidyverse)
apples <- structure(list(Def = c("price", "stock", "revenue", "profit"), `Q1-2021` = c(15L, 1270L, 57866L, 22156L), `Q2-2021` = c(17L, 3575L, 66287L, 34187L), `Q3-2021` = c(9L, 175L, 21755L, 9253L), ID = c("apples", "apples", "apples", "apples")), class = "data.frame", row.names = c(NA, -4L))
oranges <- structure(list(Def = c("price", "stock", "revenue", "profit"), `Q2-2021` = c(14, 2860, 53030, 27350), `Q3-2021` = c(5, 105, 13053, 5552), ID = c("oranges", "oranges", "oranges", "oranges")), class = "data.frame", row.names = c(NA, -4L))
l <- list(apples = apples,
oranges = oranges)
merged_data <- l %>%
map_dfr(~.x %>%
pivot_longer(starts_with("Q"), names_to = "quarter") %>%
separate(quarter, into = c("quarter", "year")) %>%
unite("date", year, quarter))
merged_data
#> # A tibble: 20 × 4
#> Def ID date value
#> <chr> <chr> <chr> <dbl>
#> 1 price apples 2021_Q1 15
#> 2 price apples 2021_Q2 17
#> 3 price apples 2021_Q3 9
#> 4 stock apples 2021_Q1 1270
#> 5 stock apples 2021_Q2 3575
#> 6 stock apples 2021_Q3 175
#> 7 revenue apples 2021_Q1 57866
#> 8 revenue apples 2021_Q2 66287
#> 9 revenue apples 2021_Q3 21755
#> 10 profit apples 2021_Q1 22156
#> 11 profit apples 2021_Q2 34187
#> 12 profit apples 2021_Q3 9253
#> 13 price oranges 2021_Q2 14
#> 14 price oranges 2021_Q3 5
#> 15 stock oranges 2021_Q2 2860
#> 16 stock oranges 2021_Q3 105
#> 17 revenue oranges 2021_Q2 53030
#> 18 revenue oranges 2021_Q3 13053
#> 19 profit oranges 2021_Q2 27350
#> 20 profit oranges 2021_Q3 5552
merged_data %>%
# filter(Def == "price") %>% # this will just plot price
ggplot(aes(date, value, color = ID))
geom_point()
geom_line(aes(group = ID))
facet_wrap(~Def, scales = "free_y")
Created on 2022-11-21 with reprex v2.0.2