R: Counting the Frequencies of Coin Flips-CodePudding

I am working with the R programming language.

I simulated this dataset which contains 1000 coin flips - then I calculated the number of "2 Flip Sequences":

Coin <- c('H', 'T')
Results = sample(Coin,1000, replace = TRUE)
My_Data = data.frame(id = 1:1000, Results)



Pairs = data.frame(first = head(My_Data$Results, -1), second = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second Freq
1     H      H  255
2     T      H  245
3     H      T  246
4     T      T  253

I am curious - is it possible to extend the above code for "3 Flip Sequences"?

For example - I tried modifying parts of the code to see how the results change (and hoped to stumble across the correct way to write this code):

# First Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = head(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H  245
3     H      T     H    0
4     T      T     H    0
5     H      H     T    0
6     T      H     T    0
7     H      T     T  246
8     T      T     T  253

# Second Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = tail(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H    0
3     H      T     H    0
4     T      T     H  245
5     H      H     T  246
6     T      H     T    0
7     H      T     T    0
8     T      T     T  253

I am not sure which of these options are correct?

In general, I am looking to understand the logic as to how I can adapt the above code for an "arbitrary number of coin flips" (e.g. "4 flip sequences", "5 flip sequences", etc.)
Also, this might not be the most efficient way to calculate these frequencies - I would also be interested in learning about other ways that might be more efficient ( e.g. as the overall size of the data increases).

Thanks!

CodePudding user response：

You could first cut along 3 1 breaks, split it along the levels. The interaction can now be tabled to get the result.

My_Data$cut3 <-  cut(seq_len(nrow(My_Data)), seq.int(1, nrow(My_Data), length.out=3   1), include.lowest=TRUE)

(res <- interaction(split(My_Data$Results, My_Data$cut3)) |> table() |> as.data.frame())

#    Var1 Freq
# 1 H.H.H   51
# 2 T.H.H   58
# 3 H.T.H   43
# 4 T.T.H   49
# 5 H.H.T   38
# 6 T.H.T   51
# 7 H.T.T   64
# 8 T.T.T   46

To get the desired output, we can strsplit Var1.

strsplit(as.character(res$Var1), '\\.') |> do.call(what=rbind) |>
  cbind.data.frame(res$Freq) |> setNames(c('first', 'second', 'third', 'Freq'))
#   first second third Freq
# 1     H      H    H   51
# 2     T      H    H   58
# 3     H      T    H   43
# 4     T      T    H   49
# 5     H      H    T   38
# 6     T      H    T   51
# 7     H      T    T   64
# 8     T      T    T   46

Note, that nrow of your data should be divisible by 3.

Edit

To generalize, we may write a small function.

f <- \(x, n) {
  ct <-  cut(seq_len(nrow(x)), seq.int(1L, nrow(x), length.out=n   1L), include.lowest=TRUE)
  res <- interaction(split(x$Results, ct)) |> table() |> as.data.frame()
  strsplit(as.character(res$Var1), '\\.') |> do.call(what=rbind) |>
    cbind.data.frame(res$Freq) |> setNames(c(LETTERS[seq_len(n)], 'Freq'))
}

f(My_Data, 4)
#    A B C D Freq
# 1  H H H H   13
# 2  T H H H   25
# 3  H T H H   18
# 4  T T H H   17
# 5  H H T H   18
# 6  T H T H   15
# 7  H T T H   21
# 8  T T T H   24
# 9  H H H T   26
# 10 T H H T   15
# 11 H T H T   16
# 12 T T H T   18
# 13 H H T T   22
# 14 T H T T   18
# 15 H T T T   10
# 16 T T T T   24

Data:

set.seed(42)
My_Data <- data.frame(id=1:1200, Results=sample(c('H', 'T'), 1200, replace=TRUE))

CodePudding user response：

It might be helpful to work with strings.

coin <- c("H", "T")
results <- sample(coin, 1000, replace = TRUE)

Then to get sequence counts (assuming overlapping sequences also count) for triples, we could do something like:

triples <- table(
  sapply(
    1:(length(results) - 3),
    function(i) sprintf(
      "%s%s%s",
      results[i],
      results[i   1],
      results[i   2]
    )
  )
)

which gives me something like:

HHH HHT HTH HTT THH THT TTH TTT 
132 129 138 115 129 124 116 114

This idea could be generalized fairly easily, for example:

n_sequences <- function(n, results) {
  helper <- function(i, n) if (n < 1) "" else sprintf(
    "%s%s", 
    helper(i, n - 1), 
    results[i   n - 1]
  )
  result <- data.frame(
    table(
      sapply(
        1:(length(results) - n),
        function(i) helper(i, n)
      )
    )
  )
  colnames(result) <- c("Sequence", "Frequency")
  result
}

For example:

n_sequences(5, results)

Gives me something like:

   Sequence Frequency
1     HHHHH        34
2     HHHHT        31
3     HHHTH        36
4     HHHTT        31
5     HHTHH        35
6     HHTHT        36
7     HHTTH        20
8     HHTTT        37
9     HTHHH        35
10    HTHHT        34
11    HTHTH        41
12    HTHTT        27
13    HTTHH        27
14    HTTHT        24
15    HTTTH        34
16    HTTTT        30
17    THHHH        31
18    THHHT        36
19    THHTH        36
20    THHTT        26
21    THTHH        34
22    THTHT        32
23    THTTH        31
24    THTTT        27
25    TTHHH        32
26    TTHHT        28
27    TTHTH        25
28    TTHTT        31
29    TTTHH        33
30    TTTHT        31
31    TTTTH        30
32    TTTTT        20