Home > Software design >  Extract columns using a character variable in both data frames and data tables
Extract columns using a character variable in both data frames and data tables

Time:01-30

Data frames and data.tables behave differently when selecting columns using a variable with column names. Is there a single expression that will work for both types of data structures? Why do I care? I have some user-defined functions originally written for data frames, and would like them to work for both frames and tables.

Here is an example. Make a data frame and a data.table.

> fr <- data.frame(a = 1, b = 1)                                                                     
> tb <- data.table::data.table(a = 1, b = 1)       

Here is a vector with a column name.

> v <- 'a'

It can be used to extract a column from the data frame like this:

> fr[, v]                                                                                            
[1] 1

But data.tables require something else.

> tb[, v]                                                                                            
Error in `[.data.table`(tb, , v) :                                                                   
  j (the 2nd argument inside [...]) is a single symbol but column name 'v' is not found. Perhaps you 
intended DT[, ..v]. This difference to data.frame is deliberate and explained in FAQ 1.1.            

> tb[, ..v]                                                                                          
   a                                                                                                 
1: 1                                                                                                 

> tb[, v, with = FALSE]                                                                              
   a                                                                                                 
1: 1                                                             

Neither of these options that work with data tables will work with data frames.

> fr[, ..v]                                                                                          
Error in `[.data.frame`(fr, , ..v) : object '..v' not found      

> fr[, v, with = FALSE]                                                                              
Error in `[.data.frame`(fr, , v, with = FALSE) :                                                     
  unused argument (with = FALSE)   

Is there an approach that works for both data frames and data.tables?

I know I can use this list-style indexing for both:

> fr[[v]]                                                                                            
[1] 1                                                                                                

> tb[[v]]                                                                                            
[1] 1   

But that only works if I don't need to include a row index as well.

From FAQ 1.5 I would think I could change an option to get the desired behavior. This could be a solution, but I don't see what I would expect.

> options(datatable.WhenJisSymbolThenCallingScope=TRUE)                                              

> options()$datatable.WhenJisSymbolThenCallingScope                                                  
[1] TRUE               
                                                                              
> tb[, v]                                                                                            
Error in `[.data.table`(tb, , v) :                                                                   
  j (the 2nd argument inside [...]) is a single symbol but column name 'v' is not found. Perhaps you 
intended DT[, ..v]. This difference to data.frame is deliberate and explained in FAQ 1.1.     

Am I confused?

> packageVersion('data.table')                                                                       
[1] ‘1.14.2’  

CodePudding user response:

fr <- data.frame(a = 1, b = 1)                                                                     
tb <- data.table::data.table(a = 1, b = 1)      

v<-'a'

subset(fr, , get(v))
#>   a
#> 1 1

subset(tb, , get(v))
#>    a
#> 1: 1

CodePudding user response:

.subset2 accepts a string:

.subset2(tb, v)
.subset2(fr, v)
  • Related