Using R Base to sum a column of a dataframe for each value of a list-CodePudding

I have a dataframe named 2022_Rev that looks sort of like this:

Name   Vendor  Sales
Steve  6       80,000
Annie  4       95,000
Bill   6       45,000
Steve  3       25,000
Bill   2       40,000 
Sam    5       5,000
...    ...    ...

I also have a list of each sales person:

Employees ['Steve', 'Annie', 'Bill', 'Sam', ...]

I want to apply mean() to column sales for each item in the list "Employee". I am supposed to use base R to create a loop that goes through each value in "Employees" and then creates a vector showing the mean for each employee. So far I have:

avgSales = rep(NA, 10)       
for (i in length(Employees)){
  if(Employees[i] == 2022_Rev$Name){
    avgSales[i] = mean(2022_Rev$Sales[i])
  }
}

This is erroring apparently because if can only check one value? I'm not sure how to fix it.

CodePudding user response：

This is not normally the approach we would take in R (i.e. there are better ways to get the mean of a column by group). However, if you want an example of a for loop over the names of the Employees in your list, here is one base R approach. First preallocated a named vector of length as long as your Employees, and then fill it use a for loop:

sales_means = setNames(vector("numeric", length = length(Employees)), Employees)

for(e in Employees) {
  sales_means[e] = mean(`2022_Rev`[`2022_Rev`$Name==e, "Sales"],na.rm=T)
}

Output:

Steve Annie  Bill   Sam 
52500 95000 42500  5000

Input:

`2022_Rev` = structure(list(Name = c("Steve", "Annie", "Bill", "Steve", "Bill", 
"Sam"), Vendor = c(6L, 4L, 6L, 3L, 2L, 5L), Sales = c(80000L, 
95000L, 45000L, 25000L, 40000L, 5000L)), row.names = c(NA, -6L
), class = "data.frame")

Employees = list('Steve', 'Annie', 'Bill', 'Sam')

CodePudding user response：

We can use aggregate to calculate the mean of Sales with respect to Name , then transform your list Employees to data.frame then merge it with the aggregate result to get the values in the list

aggregate(Sales ~ Name , `2022_Rev` , mean) |>
merge(do.call(rbind , Employees) |>
data.frame(Name = _) , by.y = "Name")

Output

   Name Sales
1 Annie 95000
2  Bill 42500
3   Sam  5000
4 Steve 52500