This is an example of data:
exp_data <- structure(list(Seq = c("AAAARVDS", "AAAARVDSSSAL",
"AAAARVDSRASDQ"), Change = structure(c(19L, 20L, 13L), .Label = c("",
"C[ 58]", "C[ 58], F[ 1152]", "C[ 58], F[ 1152], L[ 12], M[ 12]",
"C[ 58], L[ 2909]", "L[ 12]", "L[ 370]", "L[ 504]", "M[ 12]",
"M[ 1283]", "M[ 1457]", "M[ 1491]", "M[ 16]", "M[ 16], Y[ 1013]",
"M[ 16], Y[ 1152]", "M[ 16], Y[ 762]", "M[ 371]", "M[ 386], Y[ 12]",
"M[ 486], W[ 12]", "Y[ 12]", "Y[ 1240]", "Y[ 1502]", "Y[ 1988]",
"Y[ 2918]"), class = "factor"), `Mass` = c(1869.943,
1048.459, 707.346), Size = structure(c(2L, 2L, 2L), .Label = c("Matt",
"Greg",
"Kieran"
), class = "factor"), `Number` = c(2L, 2L, 2L)), row.names = c(244L,
392L, 396L), class = "data.frame")
I would like to bring your attention to column name Change
as this is the one which I would like to use for filtering. We have three rows here and I would like to keep only first one because there is a change bigger than 100 for specific letter. I would like to keep all of the rows which contain the change of letter greater than 100. It might be a situatation that there is up to 4-5 letters in change column but if there is at least one with modification of at least 100 I would like to keep this row.
Do you have any simple solution for that ?
Expected output:
Seq Change Mass Size Number
244 AAAARVDS M[ 486], W[ 12] 1869.943 Greg 2
CodePudding user response:
Not entirely sure I understood your problem statement correctly, but perhaps something like this
library(dplyr)
library(stringr)
exp_data %>% filter(str_detect(Change, "\\d{3}"))
# Seq Change Mass Size Number
#1 AAAARVDS M[ 486], W[ 12] 1869.943 Greg 2
Or the same in base R
exp_data[grep("\\d{3}", exp_data$Change), ]
# Seq Change Mass Size Number
#1 AAAARVDS M[ 486], W[ 12] 1869.943 Greg 2
The idea is to use a regular expression to keep only those rows where Change
contains at least one three-digit expression.
CodePudding user response:
You can use str_extract_all
from the stringr
package
library(stringr)
data.table solution
library(data.table)
setDT(exp_data)
exp_data[, max := max(as.numeric(str_extract_all(Change, "[[:digit:]] ")[[1]])), by = Seq]
exp_data[max > 100, ]
Seq Change Mass Size Number max
1: AAAARVDS M[ 486], W[ 12] 1869.9 Greg 2 486
dplyr solution
library(dplyr)
exp_data %>%
group_by(Seq) %>%
filter(max(as.numeric(str_extract_all(Change, "[[:digit:]] ")[[1]])) > 100)
# A tibble: 1 x 5
# Groups: Seq [1]
Seq Change Mass Size Number
<chr> <fct> <dbl> <fct> <int>
1 AAAARVDS M[ 486], W[ 12] 1870. Greg 2