I would like to know how to sum the total hours by day using a condiction that only a time assigned with an X in the third column should be sum. I have this data.frame:
Date | Time | Run |
---|---|---|
17/04/12 | 00:10:00 | x |
17/04/12 | 00:19:00 | x |
17/04/12 | 00:25:00 | x |
17/04/12 | 00:29:00 | x |
17/04/12 | 00:25:00 | |
17/04/12 | 00:29:00 | x |
17/04/12 | 00:30:00 | x |
18/04/12 | 00:10:00 | |
18/04/12 | 00:14:00 | x |
18/04/12 | 00:20:00 | x |
Result:
Date | Total_Time |
---|---|
17/04/12 | 00:20:00 |
18/04/12 | 00:06:00 |
I tried using lubridate and tidyverse packages, but I have had no success.
Thanks
CodePudding user response:
I am not sure if this is what you want to do...
library(lubridate)
library(dplyr)
df %>%
mutate(
Date = dmy(Date),
Time = hms(Time),
# minute(Time)
) %>%
filter(Run == "x") %>%
group_by(Date) %>%
summarise(Total_Time = seconds_to_period(sum(period_to_seconds(Time))))
# A tibble: 2 x 2
Date Total_Time
<date> <Period>
1 2012-04-17 1H 39M 0S
2 2012-04-18 34M 0S
CodePudding user response:
First, we can discard all time-points not marked as X
.
Then, assuming there are leaps between two consecutive runs maked as X
, we can sum up the durations of every leap per day:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
data <- tribble(
~Date , ~Time, ~Run,
"17/04/12", "00:10:00" , NA,
"17/04/12", "00:20:00" , "x",
"17/04/12" , "00:25:00" , "x",
"17/04/12", "00:29:00" , NA,
"17/04/12", "00:25:00" , "x",
"17/04/12", "00:29:00" , "x",
"18/04/12", "00:10:00" , NA,
"18/04/12", "00:14:00" , "x",
"18/04/12" , "00:20:00" , "x"
)
data %>%
mutate(date = Date) %>%
filter(! is.na(Run)) %>%
# make year to 4 digits
unite(Date, Time, col = "datetime", sep = "00 - ") %>%
mutate(
date,
leap = as.integer(row_number() / 2 - 0.1),
datetime = datetime %>% parse_datetime(format = "%d/%M/%Y - %H:%M:%S"),
) %>%
group_by(date, leap) %>%
mutate(duration = abs(lag(datetime) - datetime)) %>%
group_by(date) %>%
summarise(duration = sum(duration, na.rm = TRUE))
#> # A tibble: 2 x 2
#> date duration
#> <chr> <drtn>
#> 1 17/04/12 9 mins
#> 2 18/04/12 6 mins
Created on 2021-10-07 by the reprex package (v2.0.1)