How to sum data from multiple rows and columns according to unique variable in R?-CodePudding

I am trying to sum the average total sales of each employee in this data frame (sorry for format, i'm not sure how else to get a .csv uploaded here). For example, Tim sold ten vehicles day 1, nine day 2, and eleven day 3, for an average of 10 per day. Is there a way to grab all of the columns each time the employee's name shows up, then sum and average the total sales?

CodePudding user response：

How about this:

library(dplyr)
dat <- tibble::tribble(
  ~Employee,    ~Day,   ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim",  "1/1",  5,  2,  3,
"Tim",  "1/2",  4,  2,  3,
"Tim",  "1/3",  7,  1,  3,
"Craig", "1/1", 2,  6,  1,
"Craig", "1/2", 5,  7,  2,
"Samantha", "1/1",  4,  3,  2)


dat %>% 
  rowwise() %>% 
  mutate(total_sales = sum(c_across(-c(1,2)))) %>% 
  group_by(Employee) %>% 
  summarise(average = mean(total_sales))
#> # A tibble: 3 × 2
#>   Employee average
#>   <chr>      <dbl>
#> 1 Craig       11.5
#> 2 Samantha     9  
#> 3 Tim         10

^{Created on 2022-06-27 by the reprex package (v2.0.1)}

CodePudding user response：

dat <- tibble::tribble(
  ~Employee,    ~Day,   ~Car_Sales, ~Van_Sales, ~Truck_Sales,
"Tim",  "1/1",  5,  2,  3,
"Tim",  "1/2",  4,  2,  3,
"Tim",  "1/3",  7,  1,  3,
"Craig", "1/1", 2,  6,  1,
"Craig", "1/2", 5,  7,  2,
"Samantha", "1/1",  4,  3,  2)

dat %>% mutate(sales = Car_Sales   Van_Sales   Truck_Sales) %>% group_by(Employee) %>% summarise(Avg = mean(sales))

Output should look like this:

Employee	Avg
Craig	11.5
Samantha	9.0
Tim	10.0

I hope this is what you mean.

CodePudding user response：

One option would be to put into a long format, then get the mean by summing all the data and divide by the number of days (i.e., using n_distinct).

library(tidyverse)

df %>%
  pivot_longer(ends_with("Sales")) %>%
  group_by(Employee) %>%
  summarise(Avg = sum(value) / n_distinct(Day))

Output

  Employee   Avg
  <chr>    <dbl>
1 Craig     11.5
2 Samantha   9  
3 Tim       10

Data

df <- structure(list(Employee = c("Tim", "Tim", "Tim", "Craig", "Craig", 
"Samantha"), Day = c("1/1", "1/2", "1/3", "1/1", "1/2", "1/1"
), Car_Sales = c(5, 4, 7, 2, 5, 4), Van_Sales = c(2, 2, 1, 6, 
7, 3), Truck_Sales = c(3, 3, 3, 1, 2, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))