Let's keep this simple. I have the following DT:
library(data.table)
test <- setDT(data.frame(a=c(1,2,2,1,3,3,3,4,4,4),b=c(1:9,9)))
test[a == 4,sum(b),b]
gives:
b V1
1: 8 8
2: 9 9
but why? Should the sum of the second row not be equal to 18? If I use first:
test[a == 4]
we get
a b
1: 4 8
2: 4 9
3: 4 9
so clearly, the sum of b by the second b group should be 18. What am I missing? Note: I am not saying that command makes much sense, but I am trying to understand it at this time.
CodePudding user response:
You shouldn't sum
and group
with the same column b
, otherwise you will always get a new column V1
, whose value is same with the group column b
, because each value in group column b
is unique, which means you will get it self if you sum
by an unique value in b
(V1 = b
) . For your purpose,
you can group with a new variable rleid(b)
, which marks different groups.
dt = data.table(a=c(1,2,2,1,3,3,3,4,4,4),b=c(1:9,9)))
dt[a == 4, .(a = unique(a), b = unique(b), sumb = sum(b)), by = rleid(b)]
# rleid a b sumb
# <int> <num> <num> <num>
#1: 1 4 8 8
#2: 2 4 9 18
P.S. You can also have a better understanding if you create a copy of column b
with dt[,c:=b]
and then try dt[a == 4,sum(b),c]
dt[,c := b][a == 4,sum(b),c]
# c V1
# <num> <num>
#1: 8 8
#2: 9 18