Home > Software design >  Creating a Cumulative Sum Plot using ggplot with duplicate x values
Creating a Cumulative Sum Plot using ggplot with duplicate x values

Time:03-02

In my hypothetical example, people order ice-cream at a stand and each time an order is placed, the month the order was made and the number of orders placed is recorded. Each row represents a unique person who placed the order. For each flavor of ice-cream, I am curious to know the cumulative orders placed over the various months. For instance if a total of 3 Vanilla orders were placed in April and 4 in May, the graph should show one data point at 3 for April and one at 7 for May.

The issue I am running into is each row is being plotted separately (so there would be 3 separate points at April as opposed to just 1).

My secondary issue is that my dates are not in chronological order on my graph. I thought converting the Month column to Date format would fix this but it doesn't seem to.

Here is my code below:

library(lubridate)

Flavor <- c("Vanilla", "Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","chocolate","chocolate","chocolate")
Month <- c("1-Jun-21", "1-May-19", "1-May-19","1-Apr-19", "1-Apr-19","1-Apr-19","1-Apr-19", "1-Mar-19", "1-Mar-19", "1-Mar-19","1-Mar-19", "1-Apr-19", "1-Mar-19", " 1-Apr-19", " 1-Jan-21", "1-May-19", "1-May-19","1-May-19","1-May-19","1-Jun-19","2-September-19", "1-September-19","1-September-19","1-December-19","1-May-19","1-May-19","1-Jun-19")
Orders <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)           

data <- data.frame(Flavor,Month,Orders)
data$Month <- dmy(data$Month)
str(data)

data2 <- data[data$Flavor == "Vanilla",]
ggplot(data=data2, aes(x=Month, y=cumsum(Orders)))   geom_point()

CodePudding user response:

In these situations, it's usually best to pre-compute your desired summary and send that to ggplot, rather than messing around with ggplot's summary functions. I've also added a geom_line() for clarity.

data %>% 
  group_by(Flavor, Month) %>% 
  summarize(Orders = sum(Orders)) %>% 
  group_by(Flavor) %>% 
  arrange(Month) %>% 
  mutate(Orders = cumsum(Orders)) %>% 
ggplot(data = ., aes(x=Month, y=Orders, color = Flavor))   geom_point()   geom_line()

enter image description here

  • Related