Sum all ways to win X of last N games in R or Python-CodePudding

I've been doing this manually in Excel, but want a way to scale it and figure R or Python is my best bet. That said, I have no idea where to start. Here's an example of the table in Excel:

PROB_WIN and PROB_LOSE are the respective probabilities of a team winning or losing said game. Column T_WINZERO is the probability the team won none of their previous three games, T_WINONE is the probability they won one of their previous three games, etc. For instance, the fifth row in T_WINZERO is simply .493 * .466 * .484 (equals .111). One can imagine how this would be difficult to scale once their becomes nearly countless ways to win, for example, 7 out of a team's last 15 games (6,435 ways to be exact).

What I'd like to do is be able to create columns automatically for a team's last N games with these new probabilities (e.g., 0 of last 3, 1 of last 3, 2 of last 3, 3 of last 3 like shown above but for any number of games) using columns PROB_WIN and PROB_LOSE to do so.

In Excel, I'm summing all the probabilities of all the ways winning X of last N games can happen and then dragging the formula down so I'm looking for a similar output in R or Python.

Thanks in advance for the help!

CodePudding user response：

Here is one way.

Rather than transcribing your picture, I created a similar but reproducible set of win probabilities (note that we don't need a column of lose probabilities since these are just the complement of win probabilities)

set.seed(2)

df <- data.frame(GAME = 1:10, PROB_WIN = runif(10, 0.45, 0.55))

df
#>    GAME  PROB_WIN
#> 1     1 0.4684882
#> 2     2 0.5202374
#> 3     3 0.5073326
#> 4     4 0.4668052
#> 5     5 0.5443839
#> 6     6 0.5443475
#> 7     7 0.4629159
#> 8     8 0.5333449
#> 9     9 0.4968019
#> 10   10 0.5049984

We can write a function that takes a vector of win probabilities and returns the probability of having n wins, taking all possible game sequences into account, like this:

won_n_games <- function(probs, n_wins) {
  
  wins <- asplit(combn(length(probs), n_wins), 2)
  losses <- lapply(wins, function(x) setdiff(seq_along(probs), x))
  sum(unlist(Map(function(wins, losses) {
    prod(probs[wins]) * prod(1 - probs[losses])
    },
      wins = wins, losses = losses)))
}

We can use this in another function which will create a column of probabilities for winning a given number of the last n games like this:

make_column <- function(probs, n, wins) {
  
  result <- rep(NA_real_, length(probs))
  for(i in (n   1):length(probs)) {
    result[i] <- won_n_games(probs[(i - n):(i - 1)], wins)
  }
  result
}

So to create the columns in your data frame we can do:

df$WIN_0_3 <- make_column(df$PROB_WIN, n = 3, wins = 0)
df$WIN_1_3 <- make_column(df$PROB_WIN, n = 3, wins = 1)
df$WIN_2_3 <- make_column(df$PROB_WIN, n = 3, wins = 2)
df$WIN_3_3 <- make_column(df$PROB_WIN, n = 3, wins = 3)

Resulting in

df
#>    GAME  PROB_WIN   WIN_0_3   WIN_1_3   WIN_2_3   WIN_3_3
#> 1     1 0.4684882        NA        NA        NA        NA
#> 2     2 0.5202374        NA        NA        NA        NA
#> 3     3 0.5073326        NA        NA        NA        NA
#> 4     4 0.4668052 0.1256299 0.3763316 0.3743888 0.1236497
#> 5     5 0.5443839 0.1260277 0.3767748 0.3739920 0.1232055
#> 6     6 0.5443475 0.1196847 0.3710328 0.3803585 0.1289240
#> 7     7 0.4629159 0.1106926 0.3614084 0.3895687 0.1383303
#> 8     8 0.5333449 0.1115001 0.3625303 0.3887919 0.1371777
#> 9     9 0.4968019 0.1142016 0.3653846 0.3860178 0.1343960
#> 10   10 0.5049984 0.1261181 0.3773585 0.3738661 0.1226573

^{Created on 2022-03-19 by the reprex package (v2.0.1)}