Home > other >  How to sample rowname-colname pairs from a crosstab (or dimname groups from an n-dimensional array)
How to sample rowname-colname pairs from a crosstab (or dimname groups from an n-dimensional array)

Time:12-03

In R it is quite trivial to "collapse" an n-dimensional array into a one-dimensional column vector and sample from that using e.g. sample() function in base R.

However, I would like to sample dimnames-groups (i.e. rowname-colname pairs in case of a two-dimensional array) based on the frequencies.

Let's have an example, and assume we have a following crosstab (the data (n=70) is randomly generated):

Man Woman
Smoking 10 20
Non-smoking 15 25

How do I sample from this that I get:

  • "Smoking Man" with probability: 10 / 70
  • "Non-smoking Man" with probability: 15 / 70
  • "Smoking Woman" with probability: 20 / 70
  • "Non-smoking Woman" with probability: 25 / 70

The easiest way would probably be grouping the dimnames (somehow), and use this as the first argument of sample function i.e.:

sample(x = vectorOfGroupedDimnames, size = 1, prob = c(crosstabAsMatrix))

Yes, and I know that the variable vectorOfGroupedDimnames can be formed using nested for loops, but there has to be more elegant ways of doing this.

So what is the easiest way to do this? Thanks.

CodePudding user response:

Maybe this will help you

library(dplyr)

data <-
  structure(c(25L, 20L, 15L, 10L), .Dim = c(2L, 2L), .Dimnames = list(
    smoke = c("Non-smoking", "Smoking"), sex = c("Female", "Male"
    )), class = "table")

data %>% 
  as_tibble() %>% 
  sample_n(size = 1,weight = n,replace = TRUE)
  • Related