Home > Software engineering >  In base or data.table for R, use a function, evaluated on a column, to select rows?
In base or data.table for R, use a function, evaluated on a column, to select rows?

Time:01-28

Given a data table DT with a column Col1, select the rows of DT where the values x in Col1 satisfy some boolean expression, for example f(x) == TRUE or another example f(x) <= 4, and then doing more data table operations.

For example, I tried something like

DT[f(Col1) == TRUE, Col2 := 2]

which does not work because f() acts on values not vectors. Using lapply(), seems to work but it take a long time to run with a very large DT.

A workaround would be to create a column and using that to select the rows

DT[, fvalues := f(Col1)][fvalues == TRUE, Col2 := 2]

but it would be better not to increase the size of DT.

CodePudding user response:

I think the problem perhaps is when you're adding the parts which are aiming to modify the data.table in-place (i.e. your := parts). I don't think you can filter in place, as such filter really requires writing to a new memory location.

This works for filtering, whilst creating a new object:

library(data.table)

f <- function(x) x > 0.5

DT <- data.table(Col1 = runif(10))

DT[f(Col1),]

#>         Col1
#> 1: 0.7916055
#> 2: 0.5391773
#> 3: 0.6855657
#> 4: 0.5250881
#> 5: 0.9089948
#> 6: 0.6639571

To do more data.table operations on a filtered table, assign to a new object and work with that one:

DT2 <- DT[f(Col1),]
DT2[, Col2 := 2]

Perhaps I've misunderstood your problem though - what function are you using? Could you post more code so we can replicate your problem more precisely?

CodePudding user response:

If f is a function working only on scalar values, you could Vectorize it:

DT[Vectorize(f)(Col1)]

Not sure this fully answers your question because Vectorize uses lapply/mapply

  • Related