Home > OS >  R data.table: how to use R variables that contain column names?
R data.table: how to use R variables that contain column names?

Time:12-12

I've read the data.table documentation several times but I still can't wrap my head around how to do some operations; more generally I still haven't understood the underlying "philosophy" on how to work with variable names. Consider this example problem:

I have a data table with variables 'a', 'b', 'c', 'd':

> dt <- data.table(a=c(1,1,2), b=1:3, c=11:13, d=21:23)
> dt
   a b  c  d
1: 1 1 11 21
2: 1 2 12 22
3: 2 3 13 23

Suppose my script interactively asks the user to input a column name and corresponding value that should be used to select rows. These two variables are stored in rowselectname and rowselectvalue:

> rowselectname
[1] "a"
> rowselectvalue
[1] 1

The script also interactively asks the user to select some row names of interest; their names are stored in colselectnames:

> colselectnames
[1] "b" "d"

Now I want to create a new data table from dt, with the rows for which rowselectname has the value rowselectvalue, and with the columns given by colselectnames. The only way I finally managed to do this is as follows:

> newdt <- dt[get(rowselectname)==rowselectvalue, ..colselectnames]
> newdt
   b  d
1: 1 21
2: 2 22

What I don't understand is why I have to use get() for the first selection and .. for the second. Why not get() for both (it doesn't work)? Or why not .. for both (doesn't work either)? This seems inconsistent to me, but maybe there's another way of doing this with a more consistent syntax. I think the most obvious should simply be newdt <- dt[rowselectname==rowselectvalue, colselectnames], which is how the rest of R seems to work.

I'd really appreciate someone explaining to me how to look at this to make sense of the syntax.

CodePudding user response:

We can specify the colselectnames in .SDcols and select the .SD - as we are providing the column name as a string, get is used to return the value of the column. It can also be done by converting to symbol and evaluate (eval(as.name(rowselectname)))

dt[get(rowselectname)==rowselectvalue, .SD, .SDcols =  colselectnames]
   b  d
1: 1 21
2: 2 22

If we want to use .. operator, use that in the j

dt[dt[, ..rowselectname][[1]] == rowselectvalue, ..colselectnames]
   b  d
1: 1 21
2: 2 22

CodePudding user response:

With the upcoming data.table version 1.14.3, get will be retired, and you'll be able to use the new env parameter:

A new interface for programming on data.table has been added, closing #2655 and many other linked issues. It is built using base R's substitute-like interface via a new env argument to [.data.table. For details see the new vignette programming on data.table, and the new ?substitute2 manual page.

# install dev version
install.packages("https://github.com/Rdatatable/data.table/archive/master.tar.gz",  repo = NULL, type = "source")

library(data.table)

dt[rowselectname==rowselectvalue, ..colselectnames, env=list(rowselectname=rowselectname)]

   b  d
1: 1 21
2: 2 22
  • Related