I am trying to filter some text of a data.table looking for a similar way to dplyr::filter (I am using a data.table approach for efficiency reasons).
However, the filtering process in data.table only returns strings where the exact match is found. Contrarily, dplyr::filter returns rows where the pattern is found, not only when it is the exact pattern.
See below for an example.
df <- data.frame (first = c("value_1 and value_2", "value_2", "value_1", "value_1"),
second = c(1, 2, 3, 4))
dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))
dt.output
only returns the rows that uniquely contain value_1
(3, 4).
filter.output
returns rows that contains value_1
(1, 3, 4)
Is it possible to use data.table to filter text while returning the same results as dplyr::filter
?
df <- data.frame (first = c("value_1 and value_2", "value_2", "value_1", "value_1"),
second = c(1, 2, 3, 4))
dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))
CodePudding user response:
This behavior is not a dplyr::filter
vs data.table
. It is just that %in%
is looking for fixed matches while grepl
returns TRUE for substring matches as well. If we use grepl
in the data.table, it works as well
library(data.table)
setDT(df)[grepl("value_1", first)]
first second
1: value_1 and value_2 1
2: value_1 3
3: value_1 4
Or may also use %like%
setDT(df)[first %like% "value_1"]
first second
1: value_1 and value_2 1
2: value_1 3
3: value_1 4