Home > front end >  Why does date format change to double
Why does date format change to double

Time:04-28

This question is related to this Group by id and drug (with dates <100 days of each other) take the earliest and latest date

The dataset is:

mydata = data.frame (Id =c(1,1,1,1,1,1,1,1,1,1),
                     Date = c("2000-01-01","2000-01-05","2000-02-02", "2000-02-12", 
                              "2000-02-14","2000-05-13", "2000-05-15", "2000-05-17", 
                              "2000-05-16", "2000-05-20"),
                     drug = c("A","A","B","B","B","A","A","A","C","C"))

   Id       Date drug
1   1 2000-01-01    A
2   1 2000-01-05    A
3   1 2000-02-02    B
4   1 2000-02-12    B
5   1 2000-02-14    B
6   1 2000-05-13    A
7   1 2000-05-15    A
8   1 2000-05-17    A
9   1 2000-05-16    C
10  1 2000-05-20    C

With this code:

library(lubridate)
library(dplyr)

mydata %>% 
  group_by(Id, drug) %>% 
  mutate(Date = ymd(Date),
         Diff = as.numeric(Date - lag(Date, default = Date[1])),
         startDate = min(Date, na.rm = T),
         endDate = max(Date, na.rm = T),
         startDate =  ifelse(Diff > 100, Date, startdate)
         )

      Id Date       drug   Diff startDate endDate   
   <dbl> <date>     <chr> <dbl>     <dbl> <date>    
 1     1 2000-01-01 A         0     17257 2000-05-17
 2     1 2000-01-05 A         4     17257 2000-05-17
 3     1 2000-02-02 B         0     17257 2000-02-14
 4     1 2000-02-12 B        10     17257 2000-02-14
 5     1 2000-02-14 B         2     17257 2000-02-14
 6     1 2000-05-13 A       129     11090 2000-05-17
 7     1 2000-05-15 A         2     17257 2000-05-17
 8     1 2000-05-17 A         2     17257 2000-05-17
 9     1 2000-05-16 C         0     17257 2000-05-20
10     1 2000-05-20 C         4     17257 2000-05-20

the startDate column changes at the last line the class from date to double and I don't understand why.

I have tried origin= "1970-01-01, as.Date, ymd ...

So my question is why does this happen?

CodePudding user response:

The reason for ifelse() changing the class from date to double is documented in help("ifelse"):

The mode of the result may depend on the value of test (see the examples), and the class attribute (see oldClass) of the result is taken from test and may be inappropriate for the values selected from yes and no.

Perhaps, dplyr::if_else() might be more appropriate here:

mydata %>% 
  group_by(Id, drug) %>% 
  mutate(Date = lubridate::ymd(Date),
         Diff = as.numeric(Date - lag(Date, default = Date[1])),
         startDate = min(Date, na.rm = T),
         endDate = max(Date, na.rm = T),
         startDate =  if_else(Diff > 100, Date, startDate)
  )

returns

# A tibble: 10 × 6
# Groups:   Id, drug [3]
      Id Date       drug   Diff startDate  endDate   
   <dbl> <date>     <fct> <dbl> <date>     <date>    
 1     1 2000-01-01 A         0 2000-01-01 2000-05-17
 2     1 2000-01-05 A         4 2000-01-01 2000-05-17
 3     1 2000-02-02 B         0 2000-02-02 2000-02-14
 4     1 2000-02-12 B        10 2000-02-02 2000-02-14
 5     1 2000-02-14 B         2 2000-02-02 2000-02-14
 6     1 2000-05-13 A       129 2000-05-13 2000-05-17
 7     1 2000-05-15 A         2 2000-01-01 2000-05-17
 8     1 2000-05-17 A         2 2000-01-01 2000-05-17
 9     1 2000-05-16 C         0 2000-05-16 2000-05-20
10     1 2000-05-20 C         4 2000-05-16 2000-05-20
  • Related