Home > Enterprise >  adding a column to specify duration of event based on dates
adding a column to specify duration of event based on dates

Time:07-22

I have a dataframe where i have to columns that represent the start of an event and the planned end of the event

What is the best way to add a column in which i could see the duration in days of the event in the dataframe ?

Another alternative would be to directly create a new dataset from it by using the group_by function on which i could see for each day the average duration of a campaign, but it seems too complicated

structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021", 
"02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021", 
"04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
)), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response:

One option

as.POSIXct(df$deadline,tz="UTC",format="%m/%d/%y")-
  as.POSIXct(df$launched_at,tz="UTC",format="%m/%d/%y")

Time differences in days
[1] 30 15 12 15 15 61

CodePudding user response:

If you're looking for duration between 'launched_at' and 'deadline',

library(dplyr)
df %>%
  mutate(launched_at = as.Date(launched_at, "%m/%d/%Y"),
         deadline = as.Date(deadline, "%m/%d/%Y"),
         duration = deadline - launched_at)

  launched_at   deadline duration
1  2021-03-26 2021-04-25  30 days
2  2021-03-24 2021-04-08  15 days
3  2021-01-05 2021-01-17  12 days
4  2021-02-17 2021-03-03  14 days
5  2021-02-15 2021-03-01  14 days
6  2021-02-25 2021-04-26  60 days

more concise way(@Darren Tsai)

df %>%
  mutate(across(c(launched_at, deadline), as.Date, "%m/%d/%Y"),
         duration = deadline - launched_at)

CodePudding user response:

We could use mdy function from lubridate package:

library(lubridate) 
library(dplyr)

df %>% 
  mutate(across(, mdy), # this line only if your dates are not in date format
         duration_days = as.integer(deadline - launched_at))
  launched_at   deadline duration_days
1  2021-03-26 2021-04-25            30
2  2021-03-24 2021-04-08            15
3  2021-01-05 2021-01-17            12
4  2021-02-17 2021-03-03            14
5  2021-02-15 2021-03-01            14
6  2021-02-25 2021-04-26            60

CodePudding user response:

Another option using difftime:

df <- structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021", 
                                     "02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021", 
                                                                                             "04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
                                     )), row.names = c(NA, 6L), class = "data.frame")

df$duration <- with(df, difftime(as.Date(deadline, "%m/%d/%Y"), as.Date(launched_at, "%m/%d/%Y"), units = c("days")))
df
#>   launched_at   deadline duration
#> 1  03/26/2021 04/25/2021  30 days
#> 2  03/24/2021 04/08/2021  15 days
#> 3  01/05/2021 01/17/2021  12 days
#> 4  02/17/2021 03/03/2021  14 days
#> 5  02/15/2021 03/01/2021  14 days
#> 6  02/25/2021 04/26/2021  60 days

Created on 2022-07-22 by the reprex package (v2.0.1)

CodePudding user response:

You can use the built-in functions within and as.Date:

df = within(df, {
  launched_at = as.Date(launched_at, "%m/%d/%y")
  deadline = as.Date(deadline, "%m/%d/%y")
  duration = deadline-launched_at})

  launched_at   deadline duration
1  2020-03-26 2020-04-25  30 days
2  2020-03-24 2020-04-08  15 days
3  2020-01-05 2020-01-17  12 days
4  2020-02-17 2020-03-03  15 days
5  2020-02-15 2020-03-01  15 days
6  2020-02-25 2020-04-26  61 days
  •  Tags:  
  • r
  • Related