I am in need of assistance. I have some data that looks similar to this.
Machine Start Stop ServiceType
1 XX 2014-12-04 <NA> AA
2 XX 2013-09-05 2013-11-05 BB
3 XX 2013-11-21 2014-09-25 BB
4 XX 2013-10-11 2014-11-18 BB
5 XX 2021-12-03 <NA> AA
6 XX 2020-08-06 2022-09-15 AA
7 XX 2021-06-10 <NA> BB
8 YY 2020-01-17 <NA> BB
9 YY 2015-11-04 2018-04-30 BB
10 YY 2016-05-28 2019-03-21 BB
11 YY 2019-09-27 <NA> BB
12 YY 2018-01-05 <NA> AA
So what I would like to do is generate a new variable, say Maintenance or something, that's either AA
or BB
if only one service types is active or CC
if both AA
and BB
are overlapping. Like,
Machine Date Maintenance
1 XX 2013-09-05 BB
2 XX 2013-11-21 BB
3 XX 2013-10-11 AA
4 XX 2014-12-04 CC
5 XX 2021-12-03 AA
6 YY 2015-11-04 BB
7 YY 2016-05-28 CC
8 YY 2020-01-17 BB
I have been working with dplyr
and lubridate
but I am a bit unsure how to perform this task, any help would be kind.
ps. NA
's in this case can be considered a machine is on that service forever.
CodePudding user response:
There are some inconsistencies in your question (some variables do overlap but they are considered separate in your expected output), here's a way to group by overlapping values and get somewhat your expected output. This solution uses the ivs
, tidyverse
, and lubridate
libraries:
library(ivs)
library(tidyverse)
library(lubridate)
df %>%
mutate(Stop = ifelse(Stop == "<NA>", Start, Stop),
across(c(Start, Stop), ymd),
Stop = if_else(Stop == Start, Stop days(1), Stop),
ivs = iv(Start, Stop)) %>%
group_by(Machine, gp = iv_identify_group(ivs)) %>%
summarise(ServiceType = toString(unique(ServiceType)),) %>%
ungroup() %>%
mutate(gp = iv_start(gp),
ServiceType = ifelse(ServiceType %in% c("BB, AA", "AA, BB"), "CC", ServiceType))
# A tibble: 6 × 3
gp Machine ServiceType
<date> <chr> <chr>
1 2013-09-05 XX BB
2 2014-12-04 XX AA
3 2020-08-06 XX CC
4 2015-11-04 YY CC
5 2019-09-27 YY BB
6 2020-01-17 YY BB
data
df <- read.table(header = T, text = " Machine Start Stop ServiceType
1 XX 2014-12-04 NA AA
2 XX 2013-09-05 2013-11-05 BB
3 XX 2013-11-21 2014-09-25 BB
4 XX 2013-10-11 2014-11-18 BB
5 XX 2021-12-03 <NA> AA
6 XX 2020-08-06 2022-09-15 AA
7 XX 2021-06-10 <NA> BB
8 YY 2020-01-17 <NA> BB
9 YY 2015-11-04 2018-04-30 BB
10 YY 2016-05-28 2019-03-21 BB
11 YY 2019-09-27 <NA> BB
12 YY 2018-01-05 <NA> AA
")