Home > OS >  Adding stratified variable to a dataframe in R
Adding stratified variable to a dataframe in R

Time:02-08

I have some data that I want to split into 4 equal parts based on the group.

My dataframe looks like this:

X Group
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 3
10 3
11 3
12 3
13 3
14 3
15 3
16 3

Now I thought about adding a thrid column to mark which data belong to which split, like this:

X Group Split
1 1 1
2 1 3
3 1 2
4 1 4
5 1 4
6 1 2
7 2 3
8 2 1
9 3 1
10 3 2
11 3 3
12 3 4
13 3 1
14 3 2
15 3 3
16 3 4

I don't need to actually split the dataset, because the data are videos and I just have to mark how (which person) has to watch them.

I know how I can generate random numbers, but I need them to be stratified to the group.

I know how I can get a stratified sample, but thats not I want, because I want to distribute ALL data (videos in this case) but in a stratified fashion.

Can you help me how to achieve this?

Thank you!

edit: I changed to example to unequally sized groups.

CodePudding user response:

You can easily do these kind of stratified operations using dplyr::group_by():

library(tidyverse)

df <- data.frame(
    X = 1:12,
    Group = c(rep(1,4), rep(2,4), rep(3,4))
)

df %>%
  group_by(Group) %>%
  mutate(Split = sample(seq_along(X), size = n(), replace = FALSE) %% 4   1) %>% 
  ungroup()
  •  Tags:  
  • Related