I have a dataframe where i have to columns that represent the start of an event and the planned end of the event
What is the best way to add a column in which i could see the duration in days of the event in the dataframe ?
Another alternative would be to directly create a new dataset from it by using the group_by function on which i could see for each day the average duration of a campaign, but it seems too complicated
structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021",
"02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021",
"04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
)), row.names = c(NA, 6L), class = "data.frame")
CodePudding user response:
One option
as.POSIXct(df$deadline,tz="UTC",format="%m/%d/%y")-
as.POSIXct(df$launched_at,tz="UTC",format="%m/%d/%y")
Time differences in days
[1] 30 15 12 15 15 61
CodePudding user response:
If you're looking for duration between 'launched_at' and 'deadline',
library(dplyr)
df %>%
mutate(launched_at = as.Date(launched_at, "%m/%d/%Y"),
deadline = as.Date(deadline, "%m/%d/%Y"),
duration = deadline - launched_at)
launched_at deadline duration
1 2021-03-26 2021-04-25 30 days
2 2021-03-24 2021-04-08 15 days
3 2021-01-05 2021-01-17 12 days
4 2021-02-17 2021-03-03 14 days
5 2021-02-15 2021-03-01 14 days
6 2021-02-25 2021-04-26 60 days
more concise way(@Darren Tsai)
df %>%
mutate(across(c(launched_at, deadline), as.Date, "%m/%d/%Y"),
duration = deadline - launched_at)
CodePudding user response:
We could use mdy
function from lubridate
package:
library(lubridate)
library(dplyr)
df %>%
mutate(across(, mdy), # this line only if your dates are not in date format
duration_days = as.integer(deadline - launched_at))
launched_at deadline duration_days
1 2021-03-26 2021-04-25 30
2 2021-03-24 2021-04-08 15
3 2021-01-05 2021-01-17 12
4 2021-02-17 2021-03-03 14
5 2021-02-15 2021-03-01 14
6 2021-02-25 2021-04-26 60
CodePudding user response:
Another option using difftime
:
df <- structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021",
"02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021",
"04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
)), row.names = c(NA, 6L), class = "data.frame")
df$duration <- with(df, difftime(as.Date(deadline, "%m/%d/%Y"), as.Date(launched_at, "%m/%d/%Y"), units = c("days")))
df
#> launched_at deadline duration
#> 1 03/26/2021 04/25/2021 30 days
#> 2 03/24/2021 04/08/2021 15 days
#> 3 01/05/2021 01/17/2021 12 days
#> 4 02/17/2021 03/03/2021 14 days
#> 5 02/15/2021 03/01/2021 14 days
#> 6 02/25/2021 04/26/2021 60 days
Created on 2022-07-22 by the reprex package (v2.0.1)
CodePudding user response:
You can use the built-in functions within
and as.Date
:
df = within(df, {
launched_at = as.Date(launched_at, "%m/%d/%y")
deadline = as.Date(deadline, "%m/%d/%y")
duration = deadline-launched_at})
launched_at deadline duration
1 2020-03-26 2020-04-25 30 days
2 2020-03-24 2020-04-08 15 days
3 2020-01-05 2020-01-17 12 days
4 2020-02-17 2020-03-03 15 days
5 2020-02-15 2020-03-01 15 days
6 2020-02-25 2020-04-26 61 days