I have this data I randomly created:
set.seed(999)
col1 = sample.int(5, 200, replace = TRUE)
col2 = sample.int(5, 200, replace = TRUE)
col3 = sample.int(5, 200, replace = TRUE)
col4 = sample.int(5, 200, replace = TRUE)
col5 = sample.int(5, 200, replace = TRUE)
col6 = sample.int(5, 200, replace = TRUE)
col7 = sample.int(5, 200, replace = TRUE)
col8 = sample.int(5, 200, replace = TRUE)
col9 = sample.int(5, 200, replace = TRUE)
col10 = sample.int(5, 200, replace = TRUE)
d = data.frame(id = 1:10, seq = c(paste(col1, collapse = ""), paste(col2, collapse = ""), paste(col3, collapse = ""), paste(col4, collapse = ""), paste(col5, collapse = ""), paste(col6, collapse = ""), paste(col7, collapse = ""), paste(col8, collapse = ""), paste(col9, collapse = ""), paste(col10, collapse = "")))
For each row, I want to record all positions when the pattern "11" appears. I found something similar in the "stringr" library:
library(stringr)
pattern = "11"
d$position_<- str_locate(d$seq, pattern = pattern)
This "kinda" worked, but it only recorded the first instance of the desired pattern. For each row, is there some function I can use that will record:
- The positions at which the desired pattern appears (in each row)
- The number of times at which the desired pattern appears (in each row)?
Thank you!
CodePudding user response:
Try gregexpr
gregexpr("11", d$seq)
[[1]]
[1] 4 29 96 99 144 167 191
attr(,"match.length")
[1] 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[2]]
[1] 68 75 119 123 192 198
attr(,"match.length")
[1] 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[3]]
[1] 20 31 59 75 82 118 165 182
attr(,"match.length")
[1] 2 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[4]]
[1] 18 25 31 42 46 61 198
attr(,"match.length")
[1] 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[5]]
[1] 36 48 64 87 133 135 139 146 187 190
attr(,"match.length")
[1] 2 2 2 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[6]]
[1] 17 31 86 141 195 197
attr(,"match.length")
[1] 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[7]]
[1] 40 78 156 179
attr(,"match.length")
[1] 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[8]]
[1] 20 23 64 90 119 126 129 131 196
attr(,"match.length")
[1] 2 2 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[9]]
[1] 3 10 34 80 114 122 185 195
attr(,"match.length")
[1] 2 2 2 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[[10]]
[1] 92 136 149 176 180
attr(,"match.length")
[1] 2 2 2 2 2
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
CodePudding user response:
You can use str_locate_all()
which returns a list with all patterns. For your case,
stringr::str_locate_all(d$seq, '11')
[[1]]
start end
[1,] 4 5
[2,] 29 30
[3,] 96 97
[4,] 99 100
[5,] 144 145
[6,] 167 168
[7,] 191 192
[[2]]
start end
[1,] 68 69
[2,] 75 76
[3,] 119 120
[4,] 123 124
[5,] 192 193
[6,] 198 199
...