I'm struggling on how can I calculate the quantity of the days between occurrences, since I need to calculate how many days does it take between maintenances on an equipment.
I have a dataframe with a lot of equipments and dates indicating the maintenance, then I need to calculate the days between the maintenances for each equipment. I will show a toy example:
test = data.frame(car = c("A", "A", "B", "B", "B", "C", "C", "D", "D", "D", "E"),
maintenance_date= c("20-09-2020", "25-09-2020", "14-05-2020", "20-05-2020", "20-05-2021", "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "13-01-2021"))
#test
# car maintenance_date
#1 A 20-09-2020
#2 A 25-09-2020
#3 B 14-05-2020
#4 B 20-05-2020
#5 B 20-05-2021
#6 C 11-01-2021
#7 C 13-01-2021
#8 D 13-01-2021
#9 D 15-01-2021
#10 D 15-01-2021
#11 E 13-01-2021
#for result, I'd like something like:
result
# car maintenance_date
#1 A 5
#2 B 6
#3 B 365
#4 C 2
#5 D 2
#6 D 0
I thought of using something like test %>% arrange(maintenance_date) %>% group_by(car) %>% ...
.
Any hint on how can I do that?
CodePudding user response:
We need to convert to Date
class before doing the arrange
and then do the group_by
'car' and get the diff
erence
library(dplyr)
library(lubridate)
test %>%
mutate(maintenance_date = dmy(maintenance_date)) %>%
arrange(maintenance_date) %>%
group_by(car) %>%
summarise(maintenance_date = diff(maintenance_date), .groups = 'drop')
-output
# A tibble: 6 × 2
car maintenance_date
<chr> <drtn>
1 A 5 days
2 B 6 days
3 B 365 days
4 C 2 days
5 D 2 days
6 D 0 days
CodePudding user response:
data.table
library(data.table)
setDT(test)
test[, maintenance_date := as.Date(maintenance_date, format="%d-%m-%Y")
][, .(ndays = diff(maintenance_date)), by = car]
# car ndays
# <char> <difftime>
# 1: A 5 days
# 2: B 6 days
# 3: B 365 days
# 4: C 2 days
# 5: D 2 days
# 6: D 0 days
CodePudding user response:
Another solution, tidyverse
-based, can be:
library(tidyverse)
library(lubridate)
test = data.frame(car = c("A", "A", "B", "B", "B", "C", "C", "D", "D", "D", "E"), maintenance_date= c("20-09-2020", "25-09-2020", "14-05-2020", "20-05-2020", "20-05-2021", "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "13-01-2021"))
test %>%
group_by(car) %>%
mutate(maintenance_date = c(-1,diff(dmy(maintenance_date)))) %>%
filter(maintenance_date >= 0) %>% ungroup
#> # A tibble: 6 × 2
#> # Groups: car [4]
#> car maintenance_date
#> <chr> <dbl>
#> 1 A 5
#> 2 B 6
#> 3 B 365
#> 4 C 2
#> 5 D 2
#> 6 D 0