How to loop over groups and columns at the same time in R?-CodePudding

I have a dataframe which looks like this example, just much larger:

Name <- c('Peter', 'Peter', 'Peter', 'Ben', 'Ben', 'Ben', 'Mary', 'Mary', 'Mary')
date <- c('2020-01-01', '2020-01-02', '2020-01-03','2020-01-01', '2020-01-02', '2020-01-03','2020-01-01', '2020-01-02', '2020-01-03')
var1 <- c(0.4, 0.6, 0.7, 0.3, 0.9, 0.2, 0.4, 0.6 , 0.7)
var2 <- c(0.5, 0.4, 0.2, 0.5, 0.4, 0.2, 0.1, 0.4 , 0.2)
var3 <- c(0.2, 0.6, 0.9, 0.5, 0.5, 0.2, 0.5, 0.5 , 0.2)
df <- data.frame(Name, date, var1, var2, var3)

I want to loop over the grouped names and columns to apply a function. I can do it for one group at a time with apply, but not over all groups:

list= apply(df[1:3,3:5],1,function(x){
      return(
        list(
      summary(x)
))})

The output in this case (i.e., for the name "Peter") is a list with the elements "var1" , "var2", "var3". The desired output would be a list with the "Name" elements, which contains the elements "var1", "var2", "var3" (or the other way round, the "var" elements containing all "Name" elements).

CodePudding user response：

I suggest looking at the package dplyr, which has a lot of handy functions for this kind of data wrangling. You haven't explained what exactly you're trying to do, but in general:

First you use the command group_by() to group your dataframe by the values in one column. It looks like you want to use the column Name.
To keep the same number of rows and compute new values you use the command mutate().
To run summary functions that return one row per group, use the function summarise().
You can chain these commands together using the pipe operator %>%.

So in your case, using the data you provided, if for each group you wanted to get the minimum value of var1, the mean of var2, and the maximum of var3, you would run:

library(dplyr)

df %>%
  mutate(var1 = as.numeric(var1),
         var2 = as.numeric(var2),
         var3 = as.numeric(var3)) %>%
  group_by(Name) %>%
  summarise(var1_min = min(var1),
            var2_mean = mean(var2),
            var3_max = max(var3))

First we convert var1, var2, and var3 to numeric values, since you've entered them as strings. Then we group by Name. Then we create a summary data.frame with three columns named var1_min, var2_mean, and var3_max.

This is a helpful resource for more.

CodePudding user response：

In addition to @Christopher Belanger's answer, you might also consider mutate(across()) or summarize(across()), which facilitates applying the same function/transformation to multiple columns.

An example:

df %>%
  group_by(Name) %>% 
  summarize(across(var1:var3, ~mean(as.numeric(.x), na.rm=T)))

Output:

  Name   var1  var2  var3
  <chr> <dbl> <dbl> <dbl>
1 Ben   0.467 0.367 0.4  
2 Mary  0.567 0.233 0.4  
3 Peter 0.567 0.367 0.567