Home > Mobile >  Is there a way to self reference a data.table in i
Is there a way to self reference a data.table in i

Time:12-08

Consider the standard data.table syntax DT[i, j, ...]. Since .SD is only defined in j and NULL in i, is there any way to implicitly (desired) or explicitly (via something like .SD) refer to the current data.table in a function in i?

Use Case

I would like to write a function that filters standard columns. The column names are the same across multiple tables and somewhat verbose. To speed up my coding by less typing, I would like to write a function like this:

library(data.table)
dt <- data.table(postal_code   = c("USA123", "SPEEDO", "USA421"),
                 customer_name = c("Taylor", "Walker", "Thompson"))
dt
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      SPEEDO        Walker
#> 3:      USA421      Thompson

# Filter all customers from a common postal code 
# that surname starts with specific letters
extract <- function(x, y, DT) {
  DT[, startsWith(postal_code, x) & startsWith(customer_name, y)]
}


# does not work
dt[extract("USA", "T", .SD)]
#> Error in .checkTypos(e, names_x): Object 'postal_code' not found.
#>    Perhaps you intended postal_code

# works but requires specifying the data.table explicitly
# plus the drawback that it cannot be called upon, e.g. a grouped .SD
# in a nested call
dt[extract("USA", "T", dt)]
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      USA421      Thompson

Desired (pseudo code)

dt[extract("USA", "T")]
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      USA421      Thompson

# but also
# subsequent steps in j
dt[extract("USA", "T"), relevant := TRUE][]
#>    postal_code customer_name relevant
#> 1:      USA123        Taylor     TRUE
#> 2:      SPEEDO        Walker       NA
#> 3:      USA421      Thompson     TRUE

# using other data.tables
another_dt[extract("USA", "T")]
yet_another_dt[extract("USA", "T")]

CodePudding user response:

I'm not a data.table expert but you can try the following workaround

> dt[,.SD[extract("USA", "T", .SD)]]
   postal_code customer_name
1:      USA123        Taylor
2:      USA421      Thompson

where you play self-reference at j within .SD

CodePudding user response:

Here is a possible approach...

#create named vector
mystr <- c(postal_code = "USA", customer_name = "T")
#build query text
query <- paste0("grepl(\"^", mystr, "\", ", names(mystr), ")", collapse = " & ")
#eval/parse dynamic text
dt[eval(parse(text = query)), ]
#    postal_code customer_name
# 1:      USA123        Taylor
# 2:      USA421      Thompson
  • Related