This question is related to this Group by id and drug (with dates <100 days of each other) take the earliest and latest date
The dataset is:
mydata = data.frame (Id =c(1,1,1,1,1,1,1,1,1,1),
Date = c("2000-01-01","2000-01-05","2000-02-02", "2000-02-12",
"2000-02-14","2000-05-13", "2000-05-15", "2000-05-17",
"2000-05-16", "2000-05-20"),
drug = c("A","A","B","B","B","A","A","A","C","C"))
Id Date drug
1 1 2000-01-01 A
2 1 2000-01-05 A
3 1 2000-02-02 B
4 1 2000-02-12 B
5 1 2000-02-14 B
6 1 2000-05-13 A
7 1 2000-05-15 A
8 1 2000-05-17 A
9 1 2000-05-16 C
10 1 2000-05-20 C
With this code:
library(lubridate)
library(dplyr)
mydata %>%
group_by(Id, drug) %>%
mutate(Date = ymd(Date),
Diff = as.numeric(Date - lag(Date, default = Date[1])),
startDate = min(Date, na.rm = T),
endDate = max(Date, na.rm = T),
startDate = ifelse(Diff > 100, Date, startdate)
)
Id Date drug Diff startDate endDate
<dbl> <date> <chr> <dbl> <dbl> <date>
1 1 2000-01-01 A 0 17257 2000-05-17
2 1 2000-01-05 A 4 17257 2000-05-17
3 1 2000-02-02 B 0 17257 2000-02-14
4 1 2000-02-12 B 10 17257 2000-02-14
5 1 2000-02-14 B 2 17257 2000-02-14
6 1 2000-05-13 A 129 11090 2000-05-17
7 1 2000-05-15 A 2 17257 2000-05-17
8 1 2000-05-17 A 2 17257 2000-05-17
9 1 2000-05-16 C 0 17257 2000-05-20
10 1 2000-05-20 C 4 17257 2000-05-20
the startDate
column changes at the last line the class from date
to double
and I don't understand why.
I have tried origin= "1970-01-01
, as.Date
, ymd
...
So my question is why does this happen?
CodePudding user response:
The reason for ifelse()
changing the class from date
to double
is documented in help("ifelse")
:
The mode of the result may depend on the value of test (see the examples), and the class attribute (see oldClass) of the result is taken from test and may be inappropriate for the values selected from yes and no.
Perhaps, dplyr::if_else()
might be more appropriate here:
mydata %>%
group_by(Id, drug) %>%
mutate(Date = lubridate::ymd(Date),
Diff = as.numeric(Date - lag(Date, default = Date[1])),
startDate = min(Date, na.rm = T),
endDate = max(Date, na.rm = T),
startDate = if_else(Diff > 100, Date, startDate)
)
returns
# A tibble: 10 × 6 # Groups: Id, drug [3] Id Date drug Diff startDate endDate <dbl> <date> <fct> <dbl> <date> <date> 1 1 2000-01-01 A 0 2000-01-01 2000-05-17 2 1 2000-01-05 A 4 2000-01-01 2000-05-17 3 1 2000-02-02 B 0 2000-02-02 2000-02-14 4 1 2000-02-12 B 10 2000-02-02 2000-02-14 5 1 2000-02-14 B 2 2000-02-02 2000-02-14 6 1 2000-05-13 A 129 2000-05-13 2000-05-17 7 1 2000-05-15 A 2 2000-01-01 2000-05-17 8 1 2000-05-17 A 2 2000-01-01 2000-05-17 9 1 2000-05-16 C 0 2000-05-16 2000-05-20 10 1 2000-05-20 C 4 2000-05-16 2000-05-20