Filter for rows that start with a number stored in a vector of numbers-CodePudding

I would like to filter a data frame by checking to see if a column starts with a number that is also stored in a numeric vector.

For example, consider the below data frame, with columns Commentand Comment.Id:

Comment         | Comment.Id
This comment    | 34_1_1_1
That comment    | 24
Another comment | 54_1_1
Please comment  | 234
More comments   | 13_1_1
Many comments   | 12
Comment again   | 119_1_1

And the following vector num:

num <- c(34, 54, 234, 13, 119)

I would like to look through the Comment.Id column, and if a comment Id starts with a number that is contained in the num vector, then filter for that row.

The resulting data frame would look like this:

Comment         | Comment.Id
This comment    | 34_1_1_1
Another comment | 54_1_1
Please comment  | 234
More comments   | 13_1_1
Comment again   | 119_1_1

I am using the R language.

CodePudding user response：

df <- structure(list(Comment = c("This comment", "That comment", "Another comment", 
"Please comment", "More comments", "Many comments", "Comment again"
), Comment.Id = c("34_1_1_1", "24", "54_1_1", "234", "13_1_1", 
"12", "119_1_1")), row.names = c(NA, -7L), class = "data.frame")

num <- c(34, 54, 234, 13, 119)

How about:

## str_extract() gets the first substring matching the REGEX pattern
df[stringr::str_extract(df$Comment.Id, "[0-9] ") %in% num, ]
#           Comment Comment.Id
#1 This comment       34_1_1_1
#3 Another comment      54_1_1
#4 Please comment          234
#5 More comments        13_1_1
#7 Comment again       119_1_1

Or in dplyr syntax:

df %>% filter(str_extract(Comment.Id, "[0-9] ") %in% num)

Or as Sotos commented, without any packages we can use:

## here, sub() removes all stuff after the first '_'
df[sub('_.*', '', df$Comment.Id) %in% num, ]

## R's native forward pipe operator, since R 4.1.0
df |> subset(sub('_.*', '', df$Comment.Id) %in% num)

Note:

I did not put an as.numeric() outside sub or str_extract, as the code works without it. But still, it is good practice to do this explicit type conversion.