Replicating a 2011 example script, the aggregate()
function of base R produces NAN
s. I was wondering if I need to use a more recent version of aggregate or a similar function? Please advise.
Example s1s2.df
can be found here: https://www.dropbox.com/s/dsqina3vuy0774u/df.csv?dl=0
Code that produces NAN
instead of summarised values:
s1.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s1.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s2.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]
s2.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]
Incorrect output:
> tibble(s1.no.present)
# A tibble: 12 × 1
s1.no.present
<int>
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
CodePudding user response:
Use a custom sum function to remove NAs:
#data
s1s2.df <- read.csv("tmp.csv")
mySum <- function(x){ sum(x, na.rm = TRUE) }
aggregate(s1s2.df$no.present[s1s2.df$sabap == -1 ],
by = list(s1s2.df$month.n[s1s2.df$sabap == -1 ]),
mySum)
# Group.1 x
# 1 1 218
# 2 2 369
# 3 3 590
# 4 4 1471
# 5 5 1880
# 6 6 2241
# 7 7 2306
# 8 8 1827
# 9 9 1377
# 10 10 774
# 11 11 281
# 12 12 280
Or use formulas:
aggregate(formula = no.present ~ month.n,
data = s1s2.df[s1s2.df$sabap == -1, ],
FUN = sum)
# month.n no.present
# 1 1 218
# 2 2 369
# 3 3 590
# 4 4 1471
# 5 5 1880
# 6 6 2241
# 7 7 2306
# 8 8 1827
# 9 9 1377
# 10 10 774
# 11 11 281
# 12 12 280