Home > front end >  Using R Base to sum a column of a dataframe for each value of a list
Using R Base to sum a column of a dataframe for each value of a list

Time:09-04

I have a dataframe named 2022_Rev that looks sort of like this:

Name   Vendor  Sales
Steve  6       80,000
Annie  4       95,000
Bill   6       45,000
Steve  3       25,000
Bill   2       40,000 
Sam    5       5,000
...    ...    ...

I also have a list of each sales person:

Employees ['Steve', 'Annie', 'Bill', 'Sam', ...]

I want to apply mean() to column sales for each item in the list "Employee". I am supposed to use base R to create a loop that goes through each value in "Employees" and then creates a vector showing the mean for each employee. So far I have:

avgSales = rep(NA, 10)       
for (i in length(Employees)){
  if(Employees[i] == 2022_Rev$Name){
    avgSales[i] = mean(2022_Rev$Sales[i])
  }
}

This is erroring apparently because if can only check one value? I'm not sure how to fix it.

CodePudding user response:

This is not normally the approach we would take in R (i.e. there are better ways to get the mean of a column by group). However, if you want an example of a for loop over the names of the Employees in your list, here is one base R approach. First preallocated a named vector of length as long as your Employees, and then fill it use a for loop:

sales_means = setNames(vector("numeric", length = length(Employees)), Employees)

for(e in Employees) {
  sales_means[e] = mean(`2022_Rev`[`2022_Rev`$Name==e, "Sales"],na.rm=T)
}

Output:

Steve Annie  Bill   Sam 
52500 95000 42500  5000

Input:

`2022_Rev` = structure(list(Name = c("Steve", "Annie", "Bill", "Steve", "Bill", 
"Sam"), Vendor = c(6L, 4L, 6L, 3L, 2L, 5L), Sales = c(80000L, 
95000L, 45000L, 25000L, 40000L, 5000L)), row.names = c(NA, -6L
), class = "data.frame")

Employees = list('Steve', 'Annie', 'Bill', 'Sam')

CodePudding user response:

We can use aggregate to calculate the mean of Sales with respect to Name , then transform your list Employees to data.frame then merge it with the aggregate result to get the values in the list

aggregate(Sales ~ Name , `2022_Rev` , mean) |>
merge(do.call(rbind , Employees) |>
data.frame(Name = _) , by.y = "Name")
  • Output
   Name Sales
1 Annie 95000
2  Bill 42500
3   Sam  5000
4 Steve 52500
  • Related