I want to add a dataframe column of integers to dataframe columns of dates in the same row, putting the result into new columns. Then count days from start to events in the same row, putting those into new columns. Have many date columns and many more than 5 rows in the dataframe, so a loop or something like that might be good.
This is what I have:
havedf <- data.frame(id = c("100","103","104","107","109"),
startDate = c("2015-04-02","2017-01-10","2016-02-11","2012-01-16","2014-03-05"),
eventOneDate = c("2015-04-10","2017-03-13","2016-03-04","2012-03-06","2014-03-11"),
eventTwoDate = c("2015-05-02","2017-08-12","2016-05-09","2012-04-05","2014-06-09"),
extradays = c(12,9,16,5,10))
This is what I want:
wantdf <- data.frame(id = c("100","103","104","107","109"),
startDate = c("2015-04-02","2017-01-10","2016-02-11","2012-01-16","2014-03-05"),
eventOneDate = c("2015-04-10","2017-03-13","2016-03-04","2012-03-06","2014-03-11"),
eventTwoDate = c("2015-05-02","2017-08-12","2016-05-09","2012-04-05","2014-06-09"),
extradays = c(12,9,16,5,10),
startDate_w_extra = c("2015-04-14","2017-01-19","2016-02-27","2012-01-21","2014-03-15"),
eventOneDate_w_extra = c("2015-04-22","2017-03-22","2016-03-20","2012-03-11","2014-03-21"),
eventTwoDate_w_extra = c("2015-05-14","2017-08-21","2016-05-25","2012-04-10","2014-06-19"),
daysToEventOne = c(8,62,22,50,6),
daysToEventTwo = c(30,214,88,80,96))
Is there a way with more steps or base R so the code is more easily understood by someone else later?
CodePudding user response:
With dplyr
, you can use across()
to simplify repeated column manipulations.
library(dplyr)
havedf %>%
mutate(across(contains("Date"), as.Date),
across(contains("Date"), ` `, extradays, .names = "{.col}_w_extra"),
across(matches("^event. Date$"), `-`, startDate, .names = "daysTo{.col}"))
# # A tibble: 5 × 10
# id startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra daysToeventOneDate daysToeventTwoDate
# <chr> <date> <date> <date> <dbl> <date> <date> <date> <drtn> <drtn>
# 1 100 2015-04-02 2015-04-10 2015-05-02 12 2015-04-14 2015-04-22 2015-05-14 8 days 30 days
# 2 103 2017-01-10 2017-03-13 2017-08-12 9 2017-01-19 2017-03-22 2017-08-21 62 days 214 days
# 3 104 2016-02-11 2016-03-04 2016-05-09 16 2016-02-27 2016-03-20 2016-05-25 22 days 88 days
# 4 107 2012-01-16 2012-03-06 2012-04-05 5 2012-01-21 2012-03-11 2012-04-10 50 days 80 days
# 5 109 2014-03-05 2014-03-11 2014-06-09 10 2014-03-15 2014-03-21 2014-06-19 6 days 96 days
CodePudding user response:
Here is how we can do it:
- transform all dates to date format. Here we do it with
ymd()
function fromlubridate
package.
Then you could do simple maths:
library(lubridate)
library(dplyr)
havedf %>%
mutate(across(contains("Date"), ymd)) %>%
mutate(startDate_w_extra = startDate extradays,
eventOneDate_w_extra = eventOneDate extradays,
eventTwoDate_w_extra = eventTwoDate extradays,
daysToEventOne = as.integer(eventOneDate - startDate),
daysToEventTwo = as.integer(eventTwoDate - startDate)
)
id startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra daysToEventOne daysToEventTwo
1 100 2015-04-02 2015-04-10 2015-05-02 12 2015-04-14 2015-04-22 2015-05-14 8 30
2 103 2017-01-10 2017-03-13 2017-08-12 9 2017-01-19 2017-03-22 2017-08-21 62 214
3 104 2016-02-11 2016-03-04 2016-05-09 16 2016-02-27 2016-03-20 2016-05-25 22 88
4 107 2012-01-16 2012-03-06 2012-04-05 5 2012-01-21 2012-03-11 2012-04-10 50 80
5 109 2014-03-05 2014-03-11 2014-06-09 10 2014-03-15 2014-03-21 2014-06-19 6 96
CodePudding user response:
A simplified version of @TarJae 's answer:
library(lubridate)
library(dplyr)
havedf %>%
mutate(across(contains("Date"), ymd)) %>%
mutate(across(contains("Date"), .fns = list(w_extra = ~ . extradays)),
daysToEventOne = as.integer(eventOneDate - startDate),
daysToEventTwo = as.integer(eventTwoDate - startDate))
id startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra
1 100 2015-04-02 2015-04-10 2015-05-02 12 2015-04-14 2015-04-22 2015-05-14
2 103 2017-01-10 2017-03-13 2017-08-12 9 2017-01-19 2017-03-22 2017-08-21
3 104 2016-02-11 2016-03-04 2016-05-09 16 2016-02-27 2016-03-20 2016-05-25
4 107 2012-01-16 2012-03-06 2012-04-05 5 2012-01-21 2012-03-11 2012-04-10
5 109 2014-03-05 2014-03-11 2014-06-09 10 2014-03-15 2014-03-21 2014-06-19
daysToEventOne daysToEventTwo
1 8 30
2 62 214
3 22 88
4 50 80
5 6 96
If you have more than two events and do not want to select the event date columns, you can use this:
havedf %>%
mutate(across(contains("Date"), ymd)) %>%
mutate(across(contains("Date"), .fns = list(w_extra = ~ . extradays)),
across(contains("event") & !contains("w_extra"), .fns = list(daysTo = ~ .- startDate)))
Output:
id startDate eventOneDate eventTwoDate extradays startDate_w_extra eventOneDate_w_extra eventTwoDate_w_extra
1 100 2015-04-02 2015-04-10 2015-05-02 12 2015-04-14 2015-04-22 2015-05-14
2 103 2017-01-10 2017-03-13 2017-08-12 9 2017-01-19 2017-03-22 2017-08-21
3 104 2016-02-11 2016-03-04 2016-05-09 16 2016-02-27 2016-03-20 2016-05-25
4 107 2012-01-16 2012-03-06 2012-04-05 5 2012-01-21 2012-03-11 2012-04-10
5 109 2014-03-05 2014-03-11 2014-06-09 10 2014-03-15 2014-03-21 2014-06-19
eventOneDate_daysTo eventTwoDate_daysTo
1 8 days 30 days
2 62 days 214 days
3 22 days 88 days
4 50 days 80 days
5 6 days 96 days