Home > Enterprise >  How to calculate days between occurrences by groups
How to calculate days between occurrences by groups

Time:11-24

I'm struggling on how can I calculate the quantity of the days between occurrences, since I need to calculate how many days does it take between maintenances on an equipment.

I have a dataframe with a lot of equipments and dates indicating the maintenance, then I need to calculate the days between the maintenances for each equipment. I will show a toy example:

test = data.frame(car = c("A", "A", "B", "B", "B", "C", "C", "D",  "D", "D", "E"), 
                   maintenance_date= c("20-09-2020", "25-09-2020", "14-05-2020", "20-05-2020", "20-05-2021", "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "13-01-2021"))
#test
#   car      maintenance_date
#1    A            20-09-2020
#2    A            25-09-2020
#3    B            14-05-2020
#4    B            20-05-2020
#5    B            20-05-2021
#6    C            11-01-2021
#7    C            13-01-2021
#8    D            13-01-2021
#9    D            15-01-2021
#10   D            15-01-2021
#11   E            13-01-2021

#for result, I'd like something like:
result
#   car     maintenance_date
#1    A         5
#2    B         6
#3    B       365
#4    C         2
#5    D         2
#6    D         0

I thought of using something like test %>% arrange(maintenance_date) %>% group_by(car) %>% ....

Any hint on how can I do that?

CodePudding user response:

We need to convert to Date class before doing the arrange and then do the group_by 'car' and get the difference

library(dplyr)
library(lubridate)
test %>% 
   mutate(maintenance_date = dmy(maintenance_date)) %>% 
   arrange(maintenance_date) %>% 
   group_by(car) %>%
   summarise(maintenance_date = diff(maintenance_date), .groups = 'drop')

-output

# A tibble: 6 × 2
  car   maintenance_date
  <chr> <drtn>          
1 A       5 days        
2 B       6 days        
3 B     365 days        
4 C       2 days        
5 D       2 days        
6 D       0 days        

CodePudding user response:

data.table

library(data.table)
setDT(test)
test[, maintenance_date := as.Date(maintenance_date, format="%d-%m-%Y")
  ][, .(ndays = diff(maintenance_date)), by = car]
#       car      ndays
#    <char> <difftime>
# 1:      A     5 days
# 2:      B     6 days
# 3:      B   365 days
# 4:      C     2 days
# 5:      D     2 days
# 6:      D     0 days

CodePudding user response:

Another solution, tidyverse-based, can be:

library(tidyverse)
library(lubridate)

test = data.frame(car = c("A", "A", "B", "B", "B", "C", "C", "D",  "D", "D", "E"), maintenance_date= c("20-09-2020", "25-09-2020", "14-05-2020", "20-05-2020", "20-05-2021", "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "13-01-2021"))

test  %>% 
  group_by(car) %>% 
  mutate(maintenance_date = c(-1,diff(dmy(maintenance_date)))) %>% 
  filter(maintenance_date >= 0) %>% ungroup

#> # A tibble: 6 × 2
#> # Groups:   car [4]
#>   car   maintenance_date
#>   <chr>            <dbl>
#> 1 A                    5
#> 2 B                    6
#> 3 B                  365
#> 4 C                    2
#> 5 D                    2
#> 6 D                    0
  • Related