I would like to filter a data frame by checking to see if a column starts with a number that is also stored in a numeric vector.
For example, consider the below data frame, with columns Comment
and Comment.Id
:
Comment | Comment.Id
This comment | 34_1_1_1
That comment | 24
Another comment | 54_1_1
Please comment | 234
More comments | 13_1_1
Many comments | 12
Comment again | 119_1_1
And the following vector num
:
num <- c(34, 54, 234, 13, 119)
I would like to look through the Comment.Id
column, and if a comment Id starts with a number that is contained in the num
vector, then filter for that row.
The resulting data frame would look like this:
Comment | Comment.Id
This comment | 34_1_1_1
Another comment | 54_1_1
Please comment | 234
More comments | 13_1_1
Comment again | 119_1_1
I am using the R language.
CodePudding user response:
df <- structure(list(Comment = c("This comment", "That comment", "Another comment",
"Please comment", "More comments", "Many comments", "Comment again"
), Comment.Id = c("34_1_1_1", "24", "54_1_1", "234", "13_1_1",
"12", "119_1_1")), row.names = c(NA, -7L), class = "data.frame")
num <- c(34, 54, 234, 13, 119)
How about:
## str_extract() gets the first substring matching the REGEX pattern
df[stringr::str_extract(df$Comment.Id, "[0-9] ") %in% num, ]
# Comment Comment.Id
#1 This comment 34_1_1_1
#3 Another comment 54_1_1
#4 Please comment 234
#5 More comments 13_1_1
#7 Comment again 119_1_1
Or in dplyr
syntax:
df %>% filter(str_extract(Comment.Id, "[0-9] ") %in% num)
Or as Sotos commented, without any packages we can use:
## here, sub() removes all stuff after the first '_'
df[sub('_.*', '', df$Comment.Id) %in% num, ]
## R's native forward pipe operator, since R 4.1.0
df |> subset(sub('_.*', '', df$Comment.Id) %in% num)
Note:
I did not put an as.numeric()
outside sub
or str_extract
, as the code works without it. But still, it is good practice to do this explicit type conversion.