Home > front end >  Unable to sum across dates (Dplyr)
Unable to sum across dates (Dplyr)

Time:10-19

I have a dataset similar to this:

> dput(df)
structure(list(Date.Bought = structure(c(18262, 18262, 18262, 
18265, 18266, 18266, 18268, 18269, 18270, 18270, 18272, 18273
), class = "Date"), Wasted = c(4L, 1L, 0L, 2L, 2L, 1L, 1L, 0L, 
1L, 1L, 0L, 1L), Ordered = c(12L, 2L, 0L, 4L, 5L, 7L, 3L, 2L, 
1L, 4L, 5L, 1L)), row.names = c(NA, -12L), class = "data.frame")

I want to divide the total wastage by the total ordered by date. So, for example, on 2020-01-01, we see a total of 5 wastages (4 1) and a total of 14 orders (12 2). I want to create a column that divides 5/14 to show the proportion of waste in relation to order by that date. The output for 2020-01-01 should be 36%.

I tried this, but it gives me row-wise calculations (whereas I want to group it by date):

df <- df %>% group_by(Date) %>%
  mutate(all = Wasted/Ordered)

CodePudding user response:

The first column of your df is Date.Bought, not Date. You also want to use summarize instead of mutate in this case. This should work:

df |> 
  group_by(Date.Bought) |> 
  summarise(all = sum(Wasted)/sum(Ordered))
  • Related