I need to remove rows with overlapped dates and keep the x value which is maximum among the overlapped dates. Here is a data frame:
data.frame(time_left = c("2011-08-05",
"2011-07-25",
"2017-08-20",
"2017-08-20",
"2017-10-09",
"2019-06-01"),
time_right= c("2011-09-14",
"2017-09-01",
"2017-09-12",
"2017-09-26",
"2017-10-15",
"2019-11-05"),
x = c(114,20,10,1,5,100) ) -> df
so my input is:
time_left time_right x
1 2011-08-05 2011-09-14 114
2 2011-07-25 2017-09-01 20
3 2017-08-20 2017-09-12 10
4 2017-08-20 2017-09-26 1
5 2017-10-09 2017-10-15 5
6 2019-06-01 2019-11-05 100
and my desired output is:
time_left time_right x
1 2011-08-05 2011-09-14 114
2 2011-07-25 2017-09-01 20
4 2017-08-20 2017-09-26 10
5 2017-10-09 2017-10-15 5
6 2019-06-01 2019-11-05 100
I appreciate any help.
CodePudding user response:
Using ivs
package and slice_max
:
library(ivs)
library(dplyr)
df %>%
mutate(across(c(time_left, time_right), as.Date)) %>%
group_by(gp = iv_identify_group(iv(time_left, time_right))) %>%
slice_max(x)
time_left time_right x gp
<date> <date> <dbl> <iv<date>>
1 2011-08-05 2011-09-14 114 [2011-08-05, 2011-09-14)
2 2017-08-20 2017-09-12 10 [2017-08-20, 2017-09-26)
3 2017-10-09 2017-10-15 5 [2017-10-09, 2017-10-15)
4 2019-06-01 2019-11-05 100 [2019-06-01, 2019-11-05)
CodePudding user response:
This may sound a little verbose, however, this could also be a solution:
- First we identify those observations that are potentially overlapped.
- Then we group the similar ones.
- In each group we choose the minimum
time_left
and maximumtime_right
andx
.
library(tidyverse)
df %>%
mutate(across(starts_with('time'), ymd),
intv = interval(time_left, time_right),
id = row_number()) %>%
mutate(id2 = map2(intv, id, ~ if (any(.x %within% intv[intv != .x])) {
id[which(.x %within% intv[intv != .x]) 1]
} else {
.y
})) %>%
group_by(id2) %>%
summarise(time_left = min(time_left),
across(c(time_right, x), max)) %>%
select(!(id2))
# A tibble: 4 × 3
time_left time_right x
<date> <date> <dbl>
1 2011-08-05 2011-09-14 114
2 2017-08-20 2017-09-26 10
3 2017-10-09 2017-10-15 5
4 2019-06-01 2019-11-05 100