Home > Software design >  Want to add a column of integers to columns of dates, then count days from start to events
Want to add a column of integers to columns of dates, then count days from start to events

Time:05-27

I want to add a dataframe column of integers to dataframe columns of dates in the same row, putting the result into new columns. Then count days from start to events in the same row, putting those into new columns. Have many date columns and many more than 5 rows in the dataframe, so a loop or something like that might be good.

This is what I have:

havedf <- data.frame(id =           c("100","103","104","107","109"),
                     startDate =    c("2015-04-02","2017-01-10","2016-02-11","2012-01-16","2014-03-05"),
                     eventOneDate = c("2015-04-10","2017-03-13","2016-03-04","2012-03-06","2014-03-11"),           
                     eventTwoDate = c("2015-05-02","2017-08-12","2016-05-09","2012-04-05","2014-06-09"),
                     extradays =    c(12,9,16,5,10))

This is what I want:

wantdf <- data.frame(id =                   c("100","103","104","107","109"),
                     startDate =            c("2015-04-02","2017-01-10","2016-02-11","2012-01-16","2014-03-05"),
                     eventOneDate =         c("2015-04-10","2017-03-13","2016-03-04","2012-03-06","2014-03-11"),           
                     eventTwoDate =         c("2015-05-02","2017-08-12","2016-05-09","2012-04-05","2014-06-09"),
                     extradays =            c(12,9,16,5,10),
                     startDate_w_extra =    c("2015-04-14","2017-01-19","2016-02-27","2012-01-21","2014-03-15"),
                     eventOneDate_w_extra = c("2015-04-22","2017-03-22","2016-03-20","2012-03-11","2014-03-21"),           
                     eventTwoDate_w_extra = c("2015-05-14","2017-08-21","2016-05-25","2012-04-10","2014-06-19"),
                     daysToEventOne =       c(8,62,22,50,6),
                     daysToEventTwo =       c(30,214,88,80,96))

Is there a way with more steps or base R so the code is more easily understood by someone else later?

CodePudding user response:

With dplyr, you can use across() to simplify repeated column manipulations.

library(dplyr)

havedf %>%
  mutate(across(contains("Date"), as.Date),
         across(contains("Date"), ` `, extradays, .names = "{.col}_w_extra"),
         across(matches("^event. Date$"), `-`, startDate, .names = "daysTo{.col}"))

# # A tibble: 5 × 10
#   id    startDate  eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra daysToeventOneDate daysToeventTwoDate
#   <chr> <date>     <date>       <date>           <dbl> <date>            <date>               <date>               <drtn>             <drtn>            
# 1 100   2015-04-02 2015-04-10   2015-05-02          12 2015-04-14        2015-04-22           2015-05-14            8 days             30 days          
# 2 103   2017-01-10 2017-03-13   2017-08-12           9 2017-01-19        2017-03-22           2017-08-21           62 days            214 days          
# 3 104   2016-02-11 2016-03-04   2016-05-09          16 2016-02-27        2016-03-20           2016-05-25           22 days             88 days          
# 4 107   2012-01-16 2012-03-06   2012-04-05           5 2012-01-21        2012-03-11           2012-04-10           50 days             80 days          
# 5 109   2014-03-05 2014-03-11   2014-06-09          10 2014-03-15        2014-03-21           2014-06-19            6 days             96 days

CodePudding user response:

Here is how we can do it:

  1. transform all dates to date format. Here we do it with ymd() function from lubridate package.

Then you could do simple maths:

library(lubridate)
library(dplyr)

havedf %>% 
  mutate(across(contains("Date"), ymd)) %>% 
  mutate(startDate_w_extra = startDate extradays,
         eventOneDate_w_extra = eventOneDate   extradays,
         eventTwoDate_w_extra = eventTwoDate   extradays,
         daysToEventOne = as.integer(eventOneDate - startDate),
         daysToEventTwo = as.integer(eventTwoDate - startDate)
         )
   id  startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra daysToEventOne daysToEventTwo
1 100 2015-04-02   2015-04-10   2015-05-02        12        2015-04-14           2015-04-22           2015-05-14              8             30
2 103 2017-01-10   2017-03-13   2017-08-12         9        2017-01-19           2017-03-22           2017-08-21             62            214
3 104 2016-02-11   2016-03-04   2016-05-09        16        2016-02-27           2016-03-20           2016-05-25             22             88
4 107 2012-01-16   2012-03-06   2012-04-05         5        2012-01-21           2012-03-11           2012-04-10             50             80
5 109 2014-03-05   2014-03-11   2014-06-09        10        2014-03-15           2014-03-21           2014-06-19              6             96

CodePudding user response:

A simplified version of @TarJae 's answer:

library(lubridate)
library(dplyr)

havedf %>% 
  mutate(across(contains("Date"), ymd)) %>% 
  mutate(across(contains("Date"), .fns = list(w_extra = ~ . extradays)),
         daysToEventOne = as.integer(eventOneDate - startDate),
         daysToEventTwo = as.integer(eventTwoDate - startDate))


   id  startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra
1 100 2015-04-02   2015-04-10   2015-05-02        12        2015-04-14           2015-04-22           2015-05-14
2 103 2017-01-10   2017-03-13   2017-08-12         9        2017-01-19           2017-03-22           2017-08-21
3 104 2016-02-11   2016-03-04   2016-05-09        16        2016-02-27           2016-03-20           2016-05-25
4 107 2012-01-16   2012-03-06   2012-04-05         5        2012-01-21           2012-03-11           2012-04-10
5 109 2014-03-05   2014-03-11   2014-06-09        10        2014-03-15           2014-03-21           2014-06-19
  daysToEventOne daysToEventTwo
1              8             30
2             62            214
3             22             88
4             50             80
5              6             96

If you have more than two events and do not want to select the event date columns, you can use this:

havedf %>% 
  mutate(across(contains("Date"), ymd)) %>% 
  mutate(across(contains("Date"), .fns = list(w_extra = ~ . extradays)),
         across(contains("event") & !contains("w_extra"), .fns = list(daysTo = ~ .- startDate)))

Output:

   id  startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra
1 100 2015-04-02   2015-04-10   2015-05-02        12        2015-04-14           2015-04-22           2015-05-14
2 103 2017-01-10   2017-03-13   2017-08-12         9        2017-01-19           2017-03-22           2017-08-21
3 104 2016-02-11   2016-03-04   2016-05-09        16        2016-02-27           2016-03-20           2016-05-25
4 107 2012-01-16   2012-03-06   2012-04-05         5        2012-01-21           2012-03-11           2012-04-10
5 109 2014-03-05   2014-03-11   2014-06-09        10        2014-03-15           2014-03-21           2014-06-19
  eventOneDate_daysTo eventTwoDate_daysTo
1              8 days             30 days
2             62 days            214 days
3             22 days             88 days
4             50 days             80 days
5              6 days             96 days
  • Related