Given a data table DT
with a column Col1
, select the rows of DT
where the values x
in Col1
satisfy some boolean expression, for example f(x) == TRUE
or another example f(x) <= 4
, and then doing more data table operations.
For example, I tried something like
DT[f(Col1) == TRUE, Col2 := 2]
which does not work because f()
acts on values not vectors. Using lapply()
, seems to work but it take a long time to run with a very large DT
.
A workaround would be to create a column and using that to select the rows
DT[, fvalues := f(Col1)][fvalues == TRUE, Col2 := 2]
but it would be better not to increase the size of DT
.
CodePudding user response:
I think the problem perhaps is when you're adding the parts which are aiming to modify the data.table
in-place (i.e. your :=
parts). I don't think you can filter in place, as such filter really requires writing to a new memory location.
This works for filtering, whilst creating a new object:
library(data.table)
f <- function(x) x > 0.5
DT <- data.table(Col1 = runif(10))
DT[f(Col1),]
#> Col1
#> 1: 0.7916055
#> 2: 0.5391773
#> 3: 0.6855657
#> 4: 0.5250881
#> 5: 0.9089948
#> 6: 0.6639571
To do more data.table
operations on a filtered table, assign to a new object and work with that one:
DT2 <- DT[f(Col1),]
DT2[, Col2 := 2]
Perhaps I've misunderstood your problem though - what function are you using? Could you post more code so we can replicate your problem more precisely?
CodePudding user response:
If f
is a function working only on scalar values, you could Vectorize
it:
DT[Vectorize(f)(Col1)]
Not sure this fully answers your question because Vectorize
uses lapply
/mapply