I am trying to sum the average total sales of each employee in this data frame (sorry for format, i'm not sure how else to get a .csv uploaded here). For example, Tim sold ten vehicles day 1, nine day 2, and eleven day 3, for an average of 10 per day. Is there a way to grab all of the columns each time the employee's name shows up, then sum and average the total sales?
CodePudding user response:
How about this:
library(dplyr)
dat <- tibble::tribble(
~Employee, ~Day, ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim", "1/1", 5, 2, 3,
"Tim", "1/2", 4, 2, 3,
"Tim", "1/3", 7, 1, 3,
"Craig", "1/1", 2, 6, 1,
"Craig", "1/2", 5, 7, 2,
"Samantha", "1/1", 4, 3, 2)
dat %>%
rowwise() %>%
mutate(total_sales = sum(c_across(-c(1,2)))) %>%
group_by(Employee) %>%
summarise(average = mean(total_sales))
#> # A tibble: 3 × 2
#> Employee average
#> <chr> <dbl>
#> 1 Craig 11.5
#> 2 Samantha 9
#> 3 Tim 10
Created on 2022-06-27 by the reprex package (v2.0.1)
CodePudding user response:
dat <- tibble::tribble(
~Employee, ~Day, ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim", "1/1", 5, 2, 3,
"Tim", "1/2", 4, 2, 3,
"Tim", "1/3", 7, 1, 3,
"Craig", "1/1", 2, 6, 1,
"Craig", "1/2", 5, 7, 2,
"Samantha", "1/1", 4, 3, 2)
dat %>% mutate(sales = Car_Sales Van_Sales Truck_Sales) %>% group_by(Employee) %>% summarise(Avg = mean(sales))
Output should look like this:
Employee | Avg |
---|---|
Craig | 11.5 |
Samantha | 9.0 |
Tim | 10.0 |
I hope this is what you mean.
CodePudding user response:
One option would be to put into a long format, then get the mean by summing all the data and divide by the number of days (i.e., using n_distinct
).
library(tidyverse)
df %>%
pivot_longer(ends_with("Sales")) %>%
group_by(Employee) %>%
summarise(Avg = sum(value) / n_distinct(Day))
Output
Employee Avg
<chr> <dbl>
1 Craig 11.5
2 Samantha 9
3 Tim 10
Data
df <- structure(list(Employee = c("Tim", "Tim", "Tim", "Craig", "Craig",
"Samantha"), Day = c("1/1", "1/2", "1/3", "1/1", "1/2", "1/1"
), Car_Sales = c(5, 4, 7, 2, 5, 4), Van_Sales = c(2, 2, 1, 6,
7, 3), Truck_Sales = c(3, 3, 3, 1, 2, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))