Home > Net >  How can I create a column using dplyr::mutate that randomly samples using column values as probabili
How can I create a column using dplyr::mutate that randomly samples using column values as probabili

Time:02-21

Suppose I have a tibble in which each row is a set of probabilities that add up to 1. For example,

probs <- tibble(A = c(0.1, 0.5, 0.6),
                B = c(0.5, 0.2, 0.1),
                C = c(0.4, 0.3, 0.3))
probs
# A tibble: 3 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1   0.1   0.5   0.4
2   0.5   0.2   0.3
3   0.6   0.1   0.3

I'd like to create a column with mutate that randomly selects a letter based on the probabilities in each row. This would be my best guess:

probs %>%
    mutate(random_outcome = sample(LETTERS[1:3], size = 1, prob = c(A, B, C)))

But this generates an error:

Error: Problem with `mutate()` column `random_outcome`.
i `random_outcome = sample(LETTERS[1:3], size = 1, prob = c(A, B, C))`.
x incorrect number of probabilities
Run `rlang::last_error()` to see where the error occurred.

CodePudding user response:

Use rowwise in combination with c_across:

library(tidyverse)
set.seed(1)

probs %>%
  rowwise() %>%
  mutate(random_outcome = sample(LETTERS[1:3], size = 1, prob = c_across(c(A, B, C)))) %>%
  ungroup()

# A tibble: 3 x 4
      A     B     C random_outcome
  <dbl> <dbl> <dbl> <chr> 
1   0.1   0.5   0.4 B     
2   0.5   0.2   0.3 A     
3   0.6   0.1   0.3 A 
  • Related