I have a unique sequence (pattern) that I have to detect from a data-frame this sequence is 1-2-3-4-5-6 (in that specific order) I have to be able to count how many times the sequence was interrupted (broken) and we will know it was interrupted whenever the character "X" appears.. For example if I have the sequence:
1, 2, 3, 4, X, 5, 6
it means that the sequence was broken after "4" another way to say it will be "it was broken between stages 4 and 5"
The objective is to quantify how many times the sequence was broken after each stage, that means how many times the X appeared after the character 1, how many times after character 2 and so on...
lets say we have the following dataset
sample<-c(1,2,"X",3,4,5,6,1,2,3,4,5,"X",6,1,2,3,4,"X",5,6,1,"X",2,3,4,5,6,1,2,3,4,5,6)
Then I can say that the sequence was broken (n times):
After stage 1 = 1 time
After stage 2 = 1 time
After stage 3 = 0 times
After stage 4 = 1 time
After stage 5 = 1 time
After stage 6 = 0 times
Thank you guys so much for the help I am trying to come up with a solution that will be suited for a large dateset but I am such learning if you perhaps don´t know the answer but can reference some books or blogs or documentation for some functions that will be so cool!
CodePudding user response:
Just count the occurrences:
table(sample[which(sample == "X")-1])
# 1 2 4 5
# 1 1 1 1
Count the occurrences with 0s for other possibles:
table(c(unique(setdiff(sample, "X")), sample[which(sample == "X")-1])) - 1
# 1 2 3 4 5 6
# 1 1 0 1 1 0
FYI, the use of which(.)-1
omit a count if the first "X"
occurrence is the first in sample
. Since you said you needed to know the stages after which the "X"
occurs, this does not appear to be a problem. If it is, one could always preface sample
with a canary value of sorts, ala
table(c("OOPS", unique(setdiff(sample, "X")), c("OOPS", sample)[which(c("OOPS", sample) == "X")-1])) - 1
# 1 2 3 4 5 6 OOPS
# 1 1 0 1 1 0 0
sample[1] <- "X"
table(c("OOPS", unique(setdiff(sample, "X")), c("OOPS", sample)[which(c("OOPS", sample) == "X")-1])) - 1
# 1 2 3 4 5 6 OOPS
# 1 1 0 1 1 0 1
CodePudding user response:
We may also do with tabulate
tabulate(sample[sample %in% "X"])