I'm trying to combine .SD with some other columns in a summarize operation, but this results in incorrect results (for my objective). For (a silly) example:
library(data.table)
t <- as.data.table(mtcars)
t[, list(cyl = sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]
I'd like this to return a data.table with 1 row and 3 columns, but instead it returns a column with 2 rows and 2 columns. Is there a way around this?
CodePudding user response:
The issue is that list(cyl, lapply(.))
is returning something that is not really frame-like. If you look at that outside of the data.table
environment, that looks like:
str(list(cyl = sum(t$cyl), lapply(t[,c("mpg","disp")], mean)))
# List of 2
# $ cyl: num 198
# $ :List of 2
# ..$ mpg : num 20.1
# ..$ disp: num 231
when a better return would look something like:
str(c(list(cyl = sum(t$cyl)), lapply(t[,c("mpg","disp")], mean)))
# List of 3
# $ cyl : num 198
# $ mpg : num 20.1
# $ disp: num 231
Instead, c
oncatenate two lists:
t[, c(list(cyl = sum(cyl)), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]
# cyl mpg disp
# <num> <num> <num>
# 1: 198 20.09062 230.7219
or just concatenate a numeric sum(cyl)
to the lapply
list (thanks BrianMontgomery):
t[, c(cyl = sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]
CodePudding user response:
Using append
setnames(t[, append(sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')], 1, 'cyl')[]
cyl mpg disp
1: 198 20.09062 230.7219