I aggregate data containing NA
s and therefore I include na.action = NULL
as explained here. Here is the code that works:
# Toy data.
df <- data.frame(x= 1:10, group= rep(1:2, 5), other_var= rnorm(10))
# Aggragate with formula.
aggregate(formula= x ~ group, data= df, na.action= NULL, FUN= function(i) sum(i))
In my situation I can not provide variable names as formula because they can change. Thus, I provide them with a string vecor in x
and by
argument like that:
var_names <- c("x", "group")
aggregate(x= df[ , var_names[1]], by= list(df[ , var_names[2]]), na.action= NULL, FUN= function(i) sum(i))
This results in an error. Interestingly, leaving out na.action= NULL
, e.g. aggregate(x= df[ , var_names[1]], by= list(df[ , var_names[2]]), FUN= function(i) sum(i))
, does not end with an error but returns the expected output. How can I avoid that rows containing NA
s disappear while providing column names as a vetor? I do need to include na.action= NULL
because my real data contains NA
s.
CodePudding user response:
You don't have to use the column names in aggregate.formula
.
This solution my be working for you.
setNames(
aggregate( cbind(df[,1], df[,3]) ~ df[,2], df, sum, na.rm=T,
na.action=na.pass ), colnames(df[,c(2,1,3)]) )
group x other_var
1 1 25 -0.7313815
2 2 30 0.3231317
Data
(I added NA
s)
df <- structure(list(x = 1:10, group = c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L), other_var = c(-1.79458090358371, 0.295106071151792,
NA, -0.589487588239041, 0.325944874015228, NA, 0.737254570399201,
0.47849317537615, NA, 0.139020009150021)), row.names = c(NA,
-10L), class = "data.frame")
CodePudding user response:
I'm not entirely sure what the issue is: assigning na.action=NULL
means to ignore them and pass any values including their NA
s to the function, untouched. This is what will happen by default in the non-formula version.
So I suggest you just omit na.action
.
Using mtcars
:
mt <- mtcars
mt$mpg[3] <- NA
var_names <- c("mpg", "cyl")
First, the formula variant:
aggregate(
as.formula(paste(var_names[1], "~", var_names[2])), data= mt,
na.action= NULL,
FUN= function(i) sum(i))
# cyl mpg
# 1 4 NA
# 2 6 138.2
# 3 8 211.4
Second, the non-formula failure:
aggregate(
x= mt[ , var_names[1]], by= list(mt[ , var_names[2]]),
na.action= NULL,
FUN= function(i) sum(i))
# Error in FUN(X[[i]], ...) : unused argument (na.action = NULL)
Fixing it:
aggregate(
x= mt[ , var_names[1]], by= list(mt[ , var_names[2]]),
# na.action= NULL,
FUN= function(i) sum(i))
# Group.1 x
# 1 4 NA
# 2 6 138.2
# 3 8 211.4
Optionally if you want a sum for that first group, then handle it in the function itself:
aggregate(
x= mt[ , var_names[1]], by= list(mt[ , var_names[2]]),
FUN= function(i) sum(i, na.rm=TRUE))
# Group.1 x
# 1 4 270.5
# 2 6 138.2
# 3 8 211.4