Home > Blockchain >  Filter a specific range of dates across mult-year data set
Filter a specific range of dates across mult-year data set

Time:11-05

I have a data frame that consist of data across multiple years. I am trying to filter specific dates in each of these years, but I am unsure of how to do this. In the code below, I have filtered the data for a single year for data that falls in two different ranges (i.e., df2 and df3). How could I modify this code to work on all of the years in my data set?

I want the code to filter the IDs with all of the dates in the range, and not to include any data that is missing a day from the range.

library(dplyr)
library(lubridate)

ID <-  rep(c("A","B","C", "D"), 5000)
date <-  rep_len(seq(dmy("01-01-2010"), dmy("31-01-2015"), by = "days"), 5000)
x <-  runif(length(date), min = 60000, max = 80000)
y <-  runif(length(date), min = 800000, max = 900000)

df <- data.frame(date = date, 
                 x = x,
                 y =y,
                 ID)


df2 <- df %>% 
  filter(date >= "2010-01-01", date <= "2010-01-31")

df3 <- df %>% 
  filter(date >= "2010-07-01", date <= "2010-07-31")

CodePudding user response:

Convert the 'date' to Date class (ymd), do a group_split into a list by the year, then create the two datasets by filtering on the date created with make_date, and return a list of 'df2' and 'df3' for each 'year' (nested list)

library(dplyr)
library(lubridate)
library(purrr)
out <- df %>%
      mutate(date = ymd(date)) %>% 
      group_split(yr = year(date), .keep = FALSE) %>% 
      map(~ {
        df2 <- .x %>% 
         filter(date >= make_date(year(first(date)), 1, 1), 
              date <= make_date(year(first(date)), 1, 31))
        df3 <-  .x %>% 
          filter(date >= make_date(year(first(date)), 7, 1),
          date <= make_date(year(first(date)), 7, 31))
    list(df2, df3)
} )

Another option is to create another column with 'year' same (considering the leap year as well)

library(stringr)
df1 <- df %>%
     mutate(date1 = ymd(str_replace(date, '^\\d{4}', '2020'))) 

and then use the OP's code to subset on 'date1'

  • Related