I am working with the R programming language.
I am trying to count the first time a certain pattern (e.g. ABCD) appears in a random string (e.g. ACABCDCDBCABCDBC - answer =6 ). I wrote a function to do this:
library(stringr)
letters <- c("A", "B", "C", "D")
results <- list()
for (i in 1:100)
{
iteration_i = i
letters_i = paste(sample(letters, 100, replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25)),collapse="")
position_i = str_locate(letters_i, "ADBC")
results_tmp = data.frame(iteration_i , letters_i, position_i)
results[[i]] <- results_tmp
}
results_df <- do.call(rbind.data.frame, results)
This looks something like this now (note: I don't think this is correct - in row 5, I see ABCD at the beginning of the row, but its being recorded as NA for some reason):
iteration_i letters_i start end
1 1 BACDCCCDCCCDCDDBBCBBAACACBBBBAAABDDDACAABDDABBABADCDDCDACCBBBCABCDABCDCCCDADDDBADBDCADAABDBDCDCAACCB NA NA
2 2 CACACCCCDCCBADACBBAADBCABBAAAAADBDDBCADCAAADADAAABDCABBAABABBCBDADCDDDDCDBADDBDCBCDDDBDCDDAACBBBBACA 20 23
3 3 CDCBDAABDDDDADBAAABBADAADBDDDBDADDCABADDDCDABBBCBCBBACBBDADABBCDCCACDBCDCDDBDBADBCDCADDADDDBDBAAABBD 79 82
4 4 ADBCDBADADBAAACAADACACACACBDDCACBDACCBDAAABDBAAAABBCCDBADADDADCBCABCBAABDCBCDCDACDCCDBADCBDDAADBCDAC 1 4
5 5 D**ABCD**DDCCBCDABADBBBBCDBCADCBBBDCAAACACCCBCBCADBDDABBACACBDABAAACCAAAAACCCCBCBCCABABDDADBABDDDCCDDCCC NA NA
6 6 DDDDDBDDDDBDDDABDDADAADCABCDAABBCCCDAABDDAACBDABBBBBABBCBDADBDCCAAADACCBCDDBDCAADCBBBCACDBBADDDDCABC NA NA
Currently, I am only generating 100 letters and hoping that this is enough to observe the desired pattern (sometimes this doesn't happen, notice the NA's) - is there a way to add a WHILE LOOP to what I have written to keep generating letters until the desired pattern first appears?
Can someone please show me how to do this?
Thanks!
CodePudding user response:
The loop is a repeat
loop, not while
, that only breaks when the pattern is found. I have set the results list length to 2, there's no point in making it bigger just to test the code.
library(stringr)
Letters <- c("A", "B", "C", "D")
Pattern <- "ADBC"
n <- 2L
set.seed(2022)
results <- vector("list", length = n)
for (i in seq.int(n)) {
repeat {
l <- sample(Letters, 100, replace = TRUE, prob=c(0.25, 0.25, 0.25, 0.25))
letters_i <- paste(l, collapse = "")
position_i <- str_locate(letters_i, pattern = Pattern)
if(any(!is.na(position_i))) break
}
results_tmp <- data.frame(iteration = i, letters = letters_i, position_i)
results[[i]] <- results_tmp
}
results_df <- do.call(rbind.data.frame, results)
results_df
#> iteration letters start end
#> 1 1 ADBDBDBBCABBBDDBADDAADCBBADACACDCCBBADAADCDDABADCABCDCDDCCCBDDAABACCBDAAAADBDDCCCCADBCBBDABBDCCCBADD 83 86
#> 2 2 DDBDBDBCDDBDBBBDBABBCCBBCCBDBDABBAAABACABADCCBBABADBCCCDABABBDBADCADCABDDDAAACCBDCAACACACBBDDDACCDDC 50 53
Created on 2022-06-11 by the reprex package (v2.0.1)