I have a dataframe which looks like this example, just much larger:
Name <- c('Peter', 'Peter', 'Peter', 'Ben', 'Ben', 'Ben', 'Mary', 'Mary', 'Mary')
date <- c('2020-01-01', '2020-01-02', '2020-01-03','2020-01-01', '2020-01-02', '2020-01-03','2020-01-01', '2020-01-02', '2020-01-03')
var1 <- c(0.4, 0.6, 0.7, 0.3, 0.9, 0.2, 0.4, 0.6 , 0.7)
var2 <- c(0.5, 0.4, 0.2, 0.5, 0.4, 0.2, 0.1, 0.4 , 0.2)
var3 <- c(0.2, 0.6, 0.9, 0.5, 0.5, 0.2, 0.5, 0.5 , 0.2)
df <- data.frame(Name, date, var1, var2, var3)
I want to loop over the grouped names and columns to apply a function. I can do it for one group at a time with apply
, but not over all groups:
list= apply(df[1:3,3:5],1,function(x){
return(
list(
summary(x)
))})
The output in this case (i.e., for the name "Peter") is a list with the elements "var1" , "var2", "var3". The desired output would be a list with the "Name" elements, which contains the elements "var1", "var2", "var3" (or the other way round, the "var" elements containing all "Name" elements).
CodePudding user response:
I suggest looking at the package dplyr, which has a lot of handy functions for this kind of data wrangling. You haven't explained what exactly you're trying to do, but in general:
- First you use the command
group_by()
to group your dataframe by the values in one column. It looks like you want to use the columnName
. - To keep the same number of rows and compute new values you use the command
mutate()
. - To run summary functions that return one row per group, use the function
summarise()
. - You can chain these commands together using the pipe operator
%>%
.
So in your case, using the data you provided, if for each group you wanted to get the minimum value of var1
, the mean of var2
, and the maximum of var3
, you would run:
library(dplyr)
df %>%
mutate(var1 = as.numeric(var1),
var2 = as.numeric(var2),
var3 = as.numeric(var3)) %>%
group_by(Name) %>%
summarise(var1_min = min(var1),
var2_mean = mean(var2),
var3_max = max(var3))
First we convert var1
, var2
, and var3
to numeric values, since you've entered them as strings. Then we group by Name
. Then we create a summary data.frame with three columns named var1_min
, var2_mean
, and var3_max
.
This is a helpful resource for more.
CodePudding user response:
In addition to @Christopher Belanger's answer, you might also consider mutate(across())
or summarize(across())
, which facilitates applying the same function/transformation to multiple columns.
An example:
df %>%
group_by(Name) %>%
summarize(across(var1:var3, ~mean(as.numeric(.x), na.rm=T)))
Output:
Name var1 var2 var3
<chr> <dbl> <dbl> <dbl>
1 Ben 0.467 0.367 0.4
2 Mary 0.567 0.233 0.4
3 Peter 0.567 0.367 0.567