Home > Blockchain >  Randomly swap dates between sets of observations taken on the same day for each group
Randomly swap dates between sets of observations taken on the same day for each group

Time:07-26

I have several individuals (ID) which can have many observations (obs) nested within the same date. What I would like to do is randomly assign dates to each set of nested observations, from each of the dates in the observed data. I would also like to restrict swaps by individual, and do this many times, producing different swaps on each iteration. The data looks something like this

ID obs date
x A 1/12/19
x B 1/12/19
x C 1/12/19
x D 1/12/19
y A 1/12/19
y B 1/12/19
y C 1/12/19
y D 1/12/19
x A 2/12/19
x B 2/12/19
x C 2/12/19
x D 2/12/19
y A 2/12/19
y B 2/12/19
y C 2/12/19
y D 2/12/19

And what I would like is this, albeit swapped randomly, where the ID and obs cols remain the same and only the order of date sets are swapped.

ID obs date
x A 2/12/19
x B 2/12/19
x C 2/12/19
x D 2/12/19
y A 2/12/19
y B 2/12/19
y C 2/12/19
y D 2/12/19
x A 1/12/19
x B 1/12/19
x C 1/12/19
x D 1/12/19
y A 1/12/19
y B 1/12/19
y C 1/12/19
y D 1/12/19

Currently I have the following code to do this

library(dplyr)
# Produce sample data
ID <- rep(c("X", "Y"),6) # individuals
obs <- rep(c("A", "A", "B", "B"),3) # nested observations
date <- c("2019-10-14", "2019-10-14", "2019-10-14",
          "2019-10-14", "2019-10-15", "2019-10-15", 
          "2019-10-15", "2019-10-15", "2019-10-16", 
          "2019-10-16", "2019-10-16", "2019-10-16") # dates to swap

df <- as.data.frame(cbind(ID, obs, date))

list <- list()
N <- 10 # no of permutations
for (i in 1:N) {
list[[i]] <- df %>%
  select(ID, obs, date) %>%
  group_by(ID, obs) %>% # restrict to sampling dates within the same ID and set of obs
  do(sample_n(., size = nrow(.))) %>%
  rename("rand.date" = "date")
}

rand.added <- list()
for (i in 1:N) {
df <- df %>% arrange(ID, obs) # arrange by ID and set of nested obs
rand.added[[i]] <- cbind(df, list[[i]]$rand.date) # add rand.date col
}

but this doesn't seem to produce the desired result I am after (nested observations aren't conserved for each individual). It also produces the same results on each iteration of the loop while I am after different swaps on each iteration.

Is there a way I can achieve this? Any help would be greatly appreciated!

CodePudding user response:

Shuffling date within ID but across obs

You noted that you want swaps to be restricted by individual. I take this to mean that you want to randomize all values of date within individual (ID) but across observations (obs).

In that case, we can use group_by(ID) to ensure we don't shuffle date across persons. We then don't even need to pay attention to obs. All we'll do is shuffle the date column and save it as date. We can then bind everything into one data.frame.

# Number of resamples
N <- 10

# Set seed for reproducibility
set.seed(123)

# Group your data by individual then shuffle only the date column
df_grouped <- df %>% group_by(ID)

result_list <- lapply(1:N, function(counter) {
  df_grouped %>%
    mutate(date = sample(date, size = n()),
           simulation = counter) %>%
    ungroup()
})

# Bind into a single data.frame
result_list %>% 
  bind_rows() %>%
  head(20)
#> # A tibble: 20 x 4
#>    ID    obs   date       simulation
#>    <chr> <chr> <chr>           <int>
#>  1 X     A     2019-10-15          1
#>  2 Y     A     2019-10-16          1
#>  3 X     B     2019-10-16          1
#>  4 Y     B     2019-10-15          1
#>  5 X     A     2019-10-14          1
#>  6 Y     A     2019-10-14          1
#>  7 X     B     2019-10-15          1
#>  8 Y     B     2019-10-16          1
#>  9 X     A     2019-10-16          1
#> 10 Y     A     2019-10-14          1
#> 11 X     B     2019-10-14          1
#> 12 Y     B     2019-10-15          1
#> 13 X     A     2019-10-15          2
#> 14 Y     A     2019-10-14          2
#> 15 X     B     2019-10-16          2
#> 16 Y     B     2019-10-16          2
#> 17 X     A     2019-10-16          2
#> 18 Y     A     2019-10-16          2
#> 19 X     B     2019-10-15          2
#> 20 Y     B     2019-10-15          2

Shuffling date within ID and obs

Alternatively, if you want to shuffle it within ID and obs, just add obs to group_by().

library(dplyr)

# Number of resamples
N <- 10

# Set seed for reproducibility
set.seed(123)

# Group your data by individual AND observation then shuffle only the date column
df_grouped <- df %>% group_by(ID, obs)

result_list <- lapply(1:N, function(counter) {
  df_grouped %>%
    mutate(date = sample(date, size = n()),
           simulation = counter) %>%
    ungroup()
})

# Bind into a single data.frame
result_list %>% 
  bind_rows() %>%
  head(20)
#> # A tibble: 20 x 4
#>    ID    obs   date       simulation
#>    <chr> <chr> <chr>           <int>
#>  1 X     A     2019-10-16          1
#>  2 Y     A     2019-10-15          1
#>  3 X     B     2019-10-15          1
#>  4 Y     B     2019-10-14          1
#>  5 X     A     2019-10-14          1
#>  6 Y     A     2019-10-16          1
#>  7 X     B     2019-10-14          1
#>  8 Y     B     2019-10-15          1
#>  9 X     A     2019-10-15          1
#> 10 Y     A     2019-10-14          1
#> 11 X     B     2019-10-16          1
#> 12 Y     B     2019-10-16          1
#> 13 X     A     2019-10-15          2
#> 14 Y     A     2019-10-16          2
#> 15 X     B     2019-10-16          2
#> 16 Y     B     2019-10-14          2
#> 17 X     A     2019-10-14          2
#> 18 Y     A     2019-10-14          2
#> 19 X     B     2019-10-15          2
#> 20 Y     B     2019-10-16          2

CodePudding user response:

Maybe this could be something you could work on:

library(dplyr)
library(map)
data %>% 
  mutate(orig_rows = row_number()) %>% 
  group_split(ID, obs) %>%
  map(~.x %>% mutate(date = ifelse(dplyr::lag(date) %in% NA, dplyr::lead(date), dplyr::lag(date)))) %>% 
  bind_rows() %>% 
  arrange(orig_rows) %>% 
  select(-orig_rows)
  • Related