Home > database >  tidyverse: Simulating random sample with nested factor
tidyverse: Simulating random sample with nested factor

Time:02-19

I want to simulate random sample with nested factor. Factor Dept has two levels A & B. Level A has two nested levels A1 and A2. Level B has three nested levels B1, B2 and B3. Want to simulate random sample from 2022-01-01 to 2022-01-31 using some R code. Part of desired output is given below (from 2022-01-01 to 2022-01-02 only for reference).

library(tibble)

set.seed(12345)
df1 <-
  tibble(
    Date   = c(rep("2022-01-01", 5), rep("2022-01-02", 4), rep("2022-01-03", 4))
  , Dept   = c("A", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "B", "B")
  , Prog   = c("A1", "A2", "B1", "B2", "B3", "A1", "B1", "B2", "B3", "A1", "A2", "B2", "B3")
  , Amount = runif(n = 13, min = 50000, max = 100000) 
  )

df1
#> # A tibble: 13 x 4
#>    Date       Dept  Prog  Amount
#>    <chr>      <chr> <chr>  <dbl>
#>  1 2022-01-01 A     A1    86045.
#>  2 2022-01-01 A     A2    93789.
#>  3 2022-01-01 B     B1    88049.
#>  4 2022-01-01 B     B2    94306.
#>  5 2022-01-01 B     B3    72824.
#>  6 2022-01-02 A     A1    58319.
#>  7 2022-01-02 B     B1    66255.
#>  8 2022-01-02 B     B2    75461.
#>  9 2022-01-02 B     B3    86385.
#> 10 2022-01-03 A     A1    99487.
#> 11 2022-01-03 A     A2    51727.
#> 12 2022-01-03 B     B2    57619.
#> 13 2022-01-03 B     B3    86784.

CodePudding user response:

If we want to sample randomly, create the expanded data with crossing and then filter/slice to return random rows for each 'date'

library(dplyr)
library(tidyr)
library(stringr)
crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), 
   by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%
   mutate(Prog = str_c(Dept, Prog)) %>%
  filter(Prog != "A3") %>% 
  mutate(Amount = runif(n = n(), min = 50000, max = 100000)) %>% 
  group_by(Date) %>% 
  slice(seq_len(sample(row_number(), 1)))  %>%
  ungroup

-output

# A tibble: 102 × 4
   Date       Dept  Prog  Amount
   <date>     <chr> <chr>  <dbl>
 1 2022-01-01 A     A1    83964.
 2 2022-01-01 A     A2    93428.
 3 2022-01-01 B     B1    85187.
 4 2022-01-01 B     B2    79144.
 5 2022-01-01 B     B3    65784.
 6 2022-01-02 A     A1    86014.
 7 2022-01-03 A     A1    76060.
 8 2022-01-03 A     A2    56412.
 9 2022-01-03 B     B1    87365.
10 2022-01-03 B     B2    66169.
# … with 92 more rows
  • Related