Home > Back-end >  How to filter data.frame based on date in R
How to filter data.frame based on date in R

Time:09-06

library(tidyverse)
dat <- data.frame(id = c(1, 2, 3, 4, 5),
                  class = c("A", "A", "B", "C", "C"),
                  value = c(12, 34, 24, 24, 11),
                  date = c(as.Date("02/27/92", "%m/%d/%y"), as.Date("02/27/99", "%m/%d/%y"),
                           as.Date("03/01/91", "%m/%d/%y"), as.Date("02/27/99", "%m/%d/%y"), as.Date("06/28/91", "%m/%d/%y")))

> dat
  id class value       date
1  1     A    12 1992-02-27
2  2     A    34 1999-02-27
3  3     B    24 1991-03-01
4  4     C    24 1999-02-27
5  5     C    11 1991-06-28

I would like to do the following:

  • for class = A, keep the rows where value > 10
  • for class = C, keep the rows where value > 11
  • for each distinct class, keep the row with the largest (or more recent) date

Here's my attempt and error:

> dat %>% filter((class %in% "A" & value > 10) | (class %in% "C" & value > 11)) %>%
  group_by(class) %>% filter(max(date))
Error in `filter()`:
! Problem while computing `..1 = max(date)`.
x Input `..1` must be a logical vector, not a date.
ℹ The error occurred in group 1: class = "A".
Run `rlang::last_error()` to see where the error occurred.

CodePudding user response:

max is not logical. We may need date == max(date) to get the logical vector

library(dplyr)
dat %>% 
  filter((class %in% "A" & value > 10) | (class %in% "C" & value > 11)) %>%
  group_by(class) %>% 
  filter(date == max(date))

-output

# A tibble: 2 × 4
# Groups:   class [2]
     id class value date      
  <dbl> <chr> <dbl> <date>    
1     2 A        34 1999-02-27
2     4 C        24 1999-02-27

Or use slice_max

dat %>% 
 filter((class %in% "A" & value > 10) | (class %in% "C" & value > 11)) %>%
 group_by(class) %>%
 slice_max(n = 1, order_by = date)
# A tibble: 2 × 4
# Groups:   class [2]
     id class value date      
  <dbl> <chr> <dbl> <date>    
1     2 A        34 1999-02-27
2     4 C        24 1999-02-27

CodePudding user response:

filter(max(date) should be filter(date == max(date))

  • Related