Home > database >  select second lowest date by group and mutate new column
select second lowest date by group and mutate new column

Time:12-26

I want to select the second lowest DATE grouped per ID (if more than one DATE) and then mutate the result into an new column (NEW).

   DF<-tibble::tribble(
      ~ID,        ~DATE, ~TYPE,
       1L, "2001-01-01",   "P",
       2L, "2002-02-02",   "P",
       2L, "2005-05-05",   "R",
       3L, "2001-10-10",   "R",
       4L, "2008-08-08",   "P",
       4L, "2010-10-10",   "R"
      )

# A tibble: 6 × 3
     ID DATE       TYPE 
  <int> <date>     <chr>
1     1 2001-01-01 P    
2     2 2002-02-02 P    
3     2 2005-05-05 R    
4     3 2001-10-10 R    
5     4 2008-08-08 P    
6     4 2010-10-10 R  

Desired output:

    ID DATE       TYPE  NEW       
  <int> <date>     <chr> <chr>     
1     1 2001-01-01 P     NA        
2     2 2002-02-02 P     2005-05-05
3     2 2005-05-05 R     2005-05-05
4     3 2001-10-10 R     NA        
5     4 2008-08-08 P     2010-10-10
6     4 2010-10-10 R     2010-10-10

I´ve tried the code below, but can´t figure how to mutate NEW column with NA if just one DATE.

DF%>%group_by(ID)%>%arrange(DATE)%>%slice_head(n=2)%>%slice_max(order_by=DATE,n=1)%>%pull(DATE)

Best regards, H

CodePudding user response:

If your dates are arranged in order, you can select the second DATE as DATE[2] within a group.

library(tidyverse)

DF %>%
  mutate(DATE = as.Date(DATE)) %>%
  group_by(ID) %>%
  arrange(ID, DATE) %>%
  mutate(NEW = DATE[2])

Output

     ID DATE       TYPE  NEW       
  <int> <date>     <chr> <date>    
1     1 2001-01-01 P     NA        
2     2 2002-02-02 P     2005-05-05
3     2 2005-05-05 R     2005-05-05
4     3 2001-10-10 R     NA        
5     4 2008-08-08 P     2010-10-10
6     4 2010-10-10 R     2010-10-10

CodePudding user response:

Using ave and order with a case handling to produce the NA's.

transform(DF, NEW=ave(DATE, ID, FUN=\(x) {
  if (NROW(x) == 1)  NA
  else x[order(as.Date(x)) == 2]
  }))
#   ID       DATE TYPE        NEW
# 1  1 2001-01-01    P       <NA>
# 2  2 2002-02-02    P 2005-05-05
# 3  2 2005-05-05    R 2005-05-05
# 4  3 2001-10-10    R       <NA>
# 5  4 2008-08-08    P 2010-10-10
# 6  4 2010-10-10    R 2010-10-10

Data:

DF <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 4L), DATE = c("2001-01-01", 
"2002-02-02", "2005-05-05", "2001-10-10", "2008-08-08", "2010-10-10"
), TYPE = c("P", "P", "R", "R", "P", "R")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))
  •  Tags:  
  • r
  • Related