Home > Enterprise >  Get name of column inside .SD call in data.table
Get name of column inside .SD call in data.table

Time:08-31

I am trying to use the name of the variables in .SD but I can't manage to get it. In the toy example below, I need to concatenate the suffix " by {z}" to any cell in the table that has an "a". The {z} part stands for the name of the variable, and I need to do it for all variables. See below the input table and the desired output table.

library(data.table)
# Input 
ip <- data.table(x = c("ab", "cd", "ac", "de"),
                 y = c("fr", "ad", "fa", "we"))

ip[]
#>     x  y
#> 1: ab fr
#> 2: cd ad
#> 3: ac fa
#> 4: de we

# Desired Output table

op <- data.table(x = c("ab b x", "cd", "ac by x", "de"),
                 y = c("fr", "ad by y", "fa by y", "we"))
op[]
#>          x       y
#> 1:  ab b x      fr
#> 2:      cd ad by y
#> 3: ac by x fa by y
#> 4:      de      we

One way that I thought could work is to use deparse(substitute(x)) as in the example below.

add_if_pattern <- function(x, pattern) {
  y <- deparse(substitute(x))
  fifelse(test = grepl(pattern, x),
          paste(x, "by",  y),
          x)
}

pattern <- "a"
z <- "blah"
q <- "bleh"
add_if_pattern(z, pattern) ## add the pattern
#> [1] "blah by z"
add_if_pattern(q, pattern) ## does not add the pattern
#> [1] "bleh"

However, when I include that function into a lapply(.SD) in data.table it does something unexpected.

tp <- copy(ip)
ip <- copy(tp)

vars <- names(ip)
ip[, (vars) := lapply(.SD,add_if_pattern, pattern)]
ip[]
#>               x            y
#> 1: ab by X[[i]]           fr
#> 2:           cd ad by X[[i]]
#> 3: ac by X[[i]] fa by X[[i]]
#> 4:           de           we

I don't need X[[i]], but the names of the original variables, either x or y. I also tried using names(.SD) but it seems that it is outside of the scope and thus got an error (see below). Could you please give a hand?

Thanks.

ip <- copy(tp)
ip[, (vars) := lapply(.SD,
                      \(x){
                        fifelse(test = grepl("classified", x),
                                paste(x, "by",  names(.SD)[..x]),
                                x)
                      })]
#> Error in `[.data.table`(ip, , `:=`((vars), lapply(.SD, function(x) {: Variable 'x' is not found in calling scope. Looking in calling scope because this symbol was prefixed with .. in the j= parameter.

Created on 2022-08-30 with reprex v2.0.2

CodePudding user response:

Consider passing an argument for column name and then use Map

add_if_pattern <- function(x, pattern, colnm) {
   y <- colnm
   fifelse(test = grepl(pattern, x),
           paste(x, "by",  y),
           x)
 }

-testing

ip[, (vars) := Map(function(x, nm)
   add_if_pattern(x, pattern, nm), .SD, names(.SD)), .SDcols = vars] 

-output

> ip
         x       y
    <char>  <char>
1: ab by x      fr
2:      cd ad by y
3: ac by x fa by y
4:      de      we
  • Related