Home > Back-end >  How to filter by multiple range of dates in R?
How to filter by multiple range of dates in R?

Time:02-18

Thank you, experts for previous answers (How to filter by range of dates in R?)

I am still having some problems dealing with the data.

Example:

id  q   date
a   1   01/01/2021
a   1   01/01/2021
a   1   21/01/2021
a   1   21/01/2021
a   1   12/02/2021
a   1   12/02/2021
a   1   12/02/2021
a   1   12/02/2021

My idea is to eliminate the observations that have more than 3 "units" in a period of 30 days. That is, if "a" has a unit "q" on "12/02/2021" [dd/mm]yyyy]: (a) if between 12/01/2021 and 12/02/2021 there are already 3 observations it must be deleted . (b) If there are less than 3 this one must remain.

My expected result is:

p   q   date
a   1   01/01/2021
a   1   01/01/2021
a   1   21/01/2021
a   1   12/02/2021
a   1   12/02/2021
a   1   12/02/2021

With this code:

df <- df %>%
  mutate(day = dmy(data))%>%
  group_by(p) %>%
  arrange(day, .by_group = TRUE) %>%
  mutate(diff = day - first(day)) %>%
  mutate(row = row_number()) %>%
  filter(row <= 3 | !diff < 30)

But the result is:

P   Q   DATE        DAY       DIFF  ROW
a   1   1/1/2021    1/1/2021    0   1
a   1   1/1/2021    1/1/2021    0   2
a   1   21/1/2021   21/1/2021   20  3
a   1   12/2/2021   12/2/2021   42  5
a   1   12/2/2021   12/2/2021   42  6
a   1   12/2/2021   12/2/2021   42  7
a   1   12/2/2021   12/2/2021   42  8

The main problem is that the diff variable must count days in periods of 30 days from the last day of the previous 30-days period - not since the first observation day.

Any help? Thanks

CodePudding user response:

Using floor_date it is quite straighforward:

library(lubridate)
library(dplyr)
df %>% 
  group_by(floor = floor_date(date, '30 days')) %>% 
  slice_head(n = 3) %>% 
  ungroup() %>% 
  select(-floor)

# A tibble: 6 x 3
  id        q date      
  <chr> <int> <date>    
1 a         1 2021-01-01
2 a         1 2021-01-01
3 a         1 2021-01-21
4 a         1 2021-02-12
5 a         1 2021-02-12
6 a         1 2021-02-12

data

df <- read.table(header = T, text = "id  q   date
a   1   01/01/2021
a   1   01/01/2021
a   1   21/01/2021
a   1   21/01/2021
a   1   12/02/2021
a   1   12/02/2021
a   1   12/02/2021
a   1   12/02/2021")

df$date<-as.Date(df$date, format = "%d/%m/%Y")
  • Related