Home > database >  select second lowest date in group
select second lowest date in group

Time:12-26

I want to select the second lowest DATE grouped per ID (if more than one DATE) and then mutate the result into an new column (NEW).

DF<-tibble::tribble(
  ~ID,        ~DATE,
   1L, "2001-01-01",
   2L, "2002-02-02",
   2L, "2002-02-02",
   3L, "2004-04-04",
   3L, "2005-05-05",
   4L, "2006-06-06",
   4L, "2006-06-06",
   4L, "2007-07-07"
  )

# A tibble: 8 × 2
     ID DATE      
  <int> <chr>     
1     1 2001-01-01
2     2 2002-02-02
3     2 2002-02-02
4     3 2004-04-04
5     3 2005-05-05
6     4 2006-06-06
7     4 2006-06-06
8     4 2007-07-07

Desired output:

# A tibble: 8 × 3
     ID DATE       NEW       
  <int> <chr>      <chr>     
1     1 2001-01-01 NA        
2     2 2002-02-02 NA        
3     2 2002-02-02 NA        
4     3 2004-04-04 2005-05-05
5     3 2005-05-05 2005-05-05
6     4 2006-06-06 2007-07-07
7     4 2006-06-06 2007-07-07
8     4 2007-07-07 2007-07-07

Best regards, H

CodePudding user response:

Just add unique to only consider unique dates when selecting the second date.

library(tidyverse)

DF %>%
  mutate(DATE = as.Date(DATE)) %>%
  group_by(ID) %>%
  arrange(ID, DATE) %>%
  mutate(NEW = unique(DATE)[2])

Output

     ID DATE       NEW       
  <int> <date>     <date>    
1     1 2001-01-01 NA        
2     2 2002-02-02 NA        
3     2 2002-02-02 NA        
4     3 2004-04-04 2005-05-05
5     3 2005-05-05 2005-05-05
6     4 2006-06-06 2007-07-07
7     4 2006-06-06 2007-07-07
8     4 2007-07-07 2007-07-07

CodePudding user response:

Just throw out the uniques, i.e. length(unique(x)) == 1.

transform(DF, NEW=ave(DATE, ID, FUN=\(x) {
  if (NROW(x) == 1 || length(unique(x)) == 1)  NA
  else x[order(as.Date(x)) == 2]
}))
#   ID       DATE        NEW
# 1  1 2001-01-01       <NA>
# 2  2 2002-02-02       <NA>
# 3  2 2002-02-02       <NA>
# 4  3 2004-04-04 2005-05-05
# 5  3 2005-05-05 2005-05-05
# 6  4 2006-06-06 2006-06-06
# 7  4 2006-06-06 2006-06-06
# 8  4 2007-07-07 2006-06-06

For the rest, using ave and order with case handling to produce NA's like in my other answer.

CodePudding user response:

An option with data.table

library(data.table)
library(dplyr)
setDT(DF)[order(ID, DATE), NEW := nth(unique(DATE), 2), by = ID]

-output

> DF
   ID       DATE        NEW
1:  1 2001-01-01       <NA>
2:  2 2002-02-02       <NA>
3:  2 2002-02-02       <NA>
4:  3 2004-04-04 2005-05-05
5:  3 2005-05-05 2005-05-05
6:  4 2006-06-06 2007-07-07
7:  4 2006-06-06 2007-07-07
8:  4 2007-07-07 2007-07-07
  •  Tags:  
  • r
  • Related