I have a dataset like this:
set.seed(123)
date_entry<- sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), by="day"), 1000)
df <- data.frame( date_entry)
df <- df %>% mutate(id = row_number())
I want to to generate a random date_end
column for each id
that is greater than date_entry
. For instance, for these dates, I want greater than 2006 for id=1:3
and 2002 for id=4
.
date_entry id
1 2006-09-28 1
2 2006-11-15 2
3 2006-02-04 3
4 2001-06-09 4
5 2000-07-13 5
CodePudding user response:
Pick a random number of days to add to each date_entry
. Here I sample uniformly between 1 and 100,000 days to add - pick whatever range of possibilities / distribution you want.
df %>%
mutate(date_end = date_entry sample(1:1e5, size = n(), replace = TRUE))
# date_entry id date_end
# 1 2006-09-28 1 2104-02-13
# 2 2006-11-15 2 2199-06-24
# 3 2006-02-04 3 2042-08-30
# 4 2001-06-09 4 2153-04-10
# 5 2000-07-13 5 2140-04-28
# 6 2008-03-04 6 2106-07-06
# 7 2005-01-15 7 2169-06-14
# ...
If you want to make sure the date_end
is in the following year (maybe somewhat implied in your question?), round up before adding random days:
df %>%
mutate(date_end =
lubridate::ceiling_date(date_entry, unit = "year")
sample(0:1e5, size = n(), replace = TRUE)
)
CodePudding user response:
Create a daily sequence between date_entry
and today's date (i.e., Sys.Date()
), then pick 1 sample for date_end
.
library(tidyverse)
df %>%
rowwise %>%
mutate(date_end = sample(seq(date_entry, Sys.Date(), by="day"), 1))
Output
date_entry id date_end
<date> <int> <date>
1 2006-09-28 1 2016-01-08
2 2006-11-15 2 2019-04-27
3 2006-02-04 3 2016-02-17
4 2001-06-09 4 2012-12-26
5 2000-07-13 5 2008-11-12
6 2008-03-04 6 2011-12-27
7 2005-01-15 7 2015-01-04
8 2003-02-15 8 2020-07-28
9 2009-03-24 9 2014-11-01
10 2003-06-06 10 2004-03-22
# … with 990 more rows