I have a dataframe df
which has some very large number of columns. Say one of those columns is named year
, another is named col1
, and another is named col2
. If I am aggregating by year
, how can I aggregate every column of df
except col1
and col2
in one line of code using aggregate()
(I wish to exclude col1
and col2
from the resulting data frame)? Can I use .
notation to do this?
aggregate(. - col1 - col2 ~ year , data = df, FUN = mean, na.action= na.pass)
Something like the above line of code obviously doesn't work, but it may help illuminate what I mean. I want to somehow exclude col1
and col2
from the left hand side of ~
while still using .
to capture all of the other variables (since there are many of them). Is this possible? I realize it may be more effective to simply remove these two variables from df
before aggregating, but is there a way to achieve this result in just one line of code using only aggregate()
?
CodePudding user response:
A tidyverse
option would be
library(dplyr)
df %>%
group_by(year) %>%
summarise(across(-c(col1, col2), mean, na.rm = TRUE), .groups = 'drop')
With aggregate
, we may also use subset
with select
option
aggregate(. ~ year, data = subset(df, select = -c(col1, col2)),
FUN = mean, na.rm = TRUE))
CodePudding user response:
aggregate(.~year, data=df[,colnames(df)[!colnames(df) %in% c("col1","col2")]], FUN=mean)
or data.table
library(data.table)
setDT(df)[,lapply(.SD, mean,na.rm=T),by=year,.SDcols=!c("col1","col2")]