I have several individuals (ID) which can have many observations (obs) nested within the same date. What I would like to do is randomly assign dates to each set of nested observations, from each of the dates in the observed data. I would also like to restrict swaps by individual, and do this many times, producing different swaps on each iteration. The data looks something like this
ID | obs | date |
---|---|---|
x | A | 1/12/19 |
x | B | 1/12/19 |
x | C | 1/12/19 |
x | D | 1/12/19 |
y | A | 1/12/19 |
y | B | 1/12/19 |
y | C | 1/12/19 |
y | D | 1/12/19 |
x | A | 2/12/19 |
x | B | 2/12/19 |
x | C | 2/12/19 |
x | D | 2/12/19 |
y | A | 2/12/19 |
y | B | 2/12/19 |
y | C | 2/12/19 |
y | D | 2/12/19 |
And what I would like is this, albeit swapped randomly, where the ID and obs cols remain the same and only the order of date sets are swapped.
ID | obs | date |
---|---|---|
x | A | 2/12/19 |
x | B | 2/12/19 |
x | C | 2/12/19 |
x | D | 2/12/19 |
y | A | 2/12/19 |
y | B | 2/12/19 |
y | C | 2/12/19 |
y | D | 2/12/19 |
x | A | 1/12/19 |
x | B | 1/12/19 |
x | C | 1/12/19 |
x | D | 1/12/19 |
y | A | 1/12/19 |
y | B | 1/12/19 |
y | C | 1/12/19 |
y | D | 1/12/19 |
Currently I have the following code to do this
library(dplyr)
# Produce sample data
ID <- rep(c("X", "Y"),6) # individuals
obs <- rep(c("A", "A", "B", "B"),3) # nested observations
date <- c("2019-10-14", "2019-10-14", "2019-10-14",
"2019-10-14", "2019-10-15", "2019-10-15",
"2019-10-15", "2019-10-15", "2019-10-16",
"2019-10-16", "2019-10-16", "2019-10-16") # dates to swap
df <- as.data.frame(cbind(ID, obs, date))
list <- list()
N <- 10 # no of permutations
for (i in 1:N) {
list[[i]] <- df %>%
select(ID, obs, date) %>%
group_by(ID, obs) %>% # restrict to sampling dates within the same ID and set of obs
do(sample_n(., size = nrow(.))) %>%
rename("rand.date" = "date")
}
rand.added <- list()
for (i in 1:N) {
df <- df %>% arrange(ID, obs) # arrange by ID and set of nested obs
rand.added[[i]] <- cbind(df, list[[i]]$rand.date) # add rand.date col
}
but this doesn't seem to produce the desired result I am after (nested observations aren't conserved for each individual). It also produces the same results on each iteration of the loop while I am after different swaps on each iteration.
Is there a way I can achieve this? Any help would be greatly appreciated!
CodePudding user response:
Shuffling date
within ID
but across obs
You noted that you want swaps to be restricted by individual. I take this to mean that you want to randomize all values of date
within individual (ID
) but across observations (obs
).
In that case, we can use group_by(ID)
to ensure we don't shuffle date
across persons. We then don't even need to pay attention to obs
. All we'll do is shuffle the date
column and save it as date
. We can then bind everything into one data.frame
.
# Number of resamples
N <- 10
# Set seed for reproducibility
set.seed(123)
# Group your data by individual then shuffle only the date column
df_grouped <- df %>% group_by(ID)
result_list <- lapply(1:N, function(counter) {
df_grouped %>%
mutate(date = sample(date, size = n()),
simulation = counter) %>%
ungroup()
})
# Bind into a single data.frame
result_list %>%
bind_rows() %>%
head(20)
#> # A tibble: 20 x 4
#> ID obs date simulation
#> <chr> <chr> <chr> <int>
#> 1 X A 2019-10-15 1
#> 2 Y A 2019-10-16 1
#> 3 X B 2019-10-16 1
#> 4 Y B 2019-10-15 1
#> 5 X A 2019-10-14 1
#> 6 Y A 2019-10-14 1
#> 7 X B 2019-10-15 1
#> 8 Y B 2019-10-16 1
#> 9 X A 2019-10-16 1
#> 10 Y A 2019-10-14 1
#> 11 X B 2019-10-14 1
#> 12 Y B 2019-10-15 1
#> 13 X A 2019-10-15 2
#> 14 Y A 2019-10-14 2
#> 15 X B 2019-10-16 2
#> 16 Y B 2019-10-16 2
#> 17 X A 2019-10-16 2
#> 18 Y A 2019-10-16 2
#> 19 X B 2019-10-15 2
#> 20 Y B 2019-10-15 2
Shuffling date
within ID
and obs
Alternatively, if you want to shuffle it within ID
and obs
, just add obs
to group_by()
.
library(dplyr)
# Number of resamples
N <- 10
# Set seed for reproducibility
set.seed(123)
# Group your data by individual AND observation then shuffle only the date column
df_grouped <- df %>% group_by(ID, obs)
result_list <- lapply(1:N, function(counter) {
df_grouped %>%
mutate(date = sample(date, size = n()),
simulation = counter) %>%
ungroup()
})
# Bind into a single data.frame
result_list %>%
bind_rows() %>%
head(20)
#> # A tibble: 20 x 4
#> ID obs date simulation
#> <chr> <chr> <chr> <int>
#> 1 X A 2019-10-16 1
#> 2 Y A 2019-10-15 1
#> 3 X B 2019-10-15 1
#> 4 Y B 2019-10-14 1
#> 5 X A 2019-10-14 1
#> 6 Y A 2019-10-16 1
#> 7 X B 2019-10-14 1
#> 8 Y B 2019-10-15 1
#> 9 X A 2019-10-15 1
#> 10 Y A 2019-10-14 1
#> 11 X B 2019-10-16 1
#> 12 Y B 2019-10-16 1
#> 13 X A 2019-10-15 2
#> 14 Y A 2019-10-16 2
#> 15 X B 2019-10-16 2
#> 16 Y B 2019-10-14 2
#> 17 X A 2019-10-14 2
#> 18 Y A 2019-10-14 2
#> 19 X B 2019-10-15 2
#> 20 Y B 2019-10-16 2
CodePudding user response:
Maybe this could be something you could work on:
library(dplyr)
library(map)
data %>%
mutate(orig_rows = row_number()) %>%
group_split(ID, obs) %>%
map(~.x %>% mutate(date = ifelse(dplyr::lag(date) %in% NA, dplyr::lead(date), dplyr::lag(date)))) %>%
bind_rows() %>%
arrange(orig_rows) %>%
select(-orig_rows)