Home > Software engineering >  Use other columns with .SD in data.table
Use other columns with .SD in data.table

Time:10-25

I'm trying to combine .SD with some other columns in a summarize operation, but this results in incorrect results (for my objective). For (a silly) example:

library(data.table)

t <- as.data.table(mtcars)

t[, list(cyl = sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]

I'd like this to return a data.table with 1 row and 3 columns, but instead it returns a column with 2 rows and 2 columns. Is there a way around this?

CodePudding user response:

The issue is that list(cyl, lapply(.)) is returning something that is not really frame-like. If you look at that outside of the data.table environment, that looks like:

str(list(cyl = sum(t$cyl), lapply(t[,c("mpg","disp")], mean)))
# List of 2
#  $ cyl: num 198
#  $    :List of 2
#   ..$ mpg : num 20.1
#   ..$ disp: num 231

when a better return would look something like:

str(c(list(cyl = sum(t$cyl)), lapply(t[,c("mpg","disp")], mean)))
# List of 3
#  $ cyl : num 198
#  $ mpg : num 20.1
#  $ disp: num 231

Instead, concatenate two lists:

t[, c(list(cyl = sum(cyl)), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]
#      cyl      mpg     disp
#    <num>    <num>    <num>
# 1:   198 20.09062 230.7219

or just concatenate a numeric sum(cyl) to the lapply list (thanks BrianMontgomery):

t[, c(cyl = sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')]

CodePudding user response:

Using append

setnames(t[,  append(sum(cyl), lapply(.SD, mean)), .SDcols = c('mpg', 'disp')], 1, 'cyl')[]
   cyl      mpg     disp
1: 198 20.09062 230.7219
  • Related