Generate random date after a date-CodePudding

I have a dataset like this:

set.seed(123)
date_entry<- sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), by="day"), 1000)
df <- data.frame( date_entry)
df <- df %>% mutate(id = row_number())

I want to to generate a random date_end column for each id that is greater than date_entry. For instance, for these dates, I want greater than 2006 for id=1:3 and 2002 for id=4.

    date_entry  id
1   2006-09-28   1
2   2006-11-15   2
3   2006-02-04   3
4   2001-06-09   4
5   2000-07-13   5

CodePudding user response：

Pick a random number of days to add to each date_entry. Here I sample uniformly between 1 and 100,000 days to add - pick whatever range of possibilities / distribution you want.

df %>%
  mutate(date_end = date_entry   sample(1:1e5, size = n(), replace = TRUE))
#     date_entry  id   date_end
# 1   2006-09-28   1 2104-02-13
# 2   2006-11-15   2 2199-06-24
# 3   2006-02-04   3 2042-08-30
# 4   2001-06-09   4 2153-04-10
# 5   2000-07-13   5 2140-04-28
# 6   2008-03-04   6 2106-07-06
# 7   2005-01-15   7 2169-06-14
# ...

If you want to make sure the date_end is in the following year (maybe somewhat implied in your question?), round up before adding random days:

df %>%
  mutate(date_end = 
    lubridate::ceiling_date(date_entry, unit = "year")   
      sample(0:1e5, size = n(), replace = TRUE)
  )

CodePudding user response：

Create a daily sequence between date_entry and today's date (i.e., Sys.Date()), then pick 1 sample for date_end.

library(tidyverse)

df %>% 
  rowwise %>% 
  mutate(date_end = sample(seq(date_entry, Sys.Date(), by="day"), 1))

Output

   date_entry    id date_end  
   <date>     <int> <date>    
 1 2006-09-28     1 2016-01-08
 2 2006-11-15     2 2019-04-27
 3 2006-02-04     3 2016-02-17
 4 2001-06-09     4 2012-12-26
 5 2000-07-13     5 2008-11-12
 6 2008-03-04     6 2011-12-27
 7 2005-01-15     7 2015-01-04
 8 2003-02-15     8 2020-07-28
 9 2009-03-24     9 2014-11-01
10 2003-06-06    10 2004-03-22
# … with 990 more rows