I have a data frame that consist of data across multiple years. I am trying to filter specific dates in each of these years, but I am unsure of how to do this. In the code below, I have filtered the data for a single year for data that falls in two different ranges (i.e., df2
and df3
). How could I modify this code to work on all of the years in my data set?
I want the code to filter the ID
s with all of the dates in the range, and not to include any data that is missing a day from the range.
library(dplyr)
library(lubridate)
ID <- rep(c("A","B","C", "D"), 5000)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-01-2015"), by = "days"), 5000)
x <- runif(length(date), min = 60000, max = 80000)
y <- runif(length(date), min = 800000, max = 900000)
df <- data.frame(date = date,
x = x,
y =y,
ID)
df2 <- df %>%
filter(date >= "2010-01-01", date <= "2010-01-31")
df3 <- df %>%
filter(date >= "2010-07-01", date <= "2010-07-31")
CodePudding user response:
Convert the 'date' to Date
class (ymd
), do a group_split
into a list
by the year
, then create the two datasets by filter
ing on the date created with make_date
, and return a list
of 'df2' and 'df3' for each 'year' (nested list)
library(dplyr)
library(lubridate)
library(purrr)
out <- df %>%
mutate(date = ymd(date)) %>%
group_split(yr = year(date), .keep = FALSE) %>%
map(~ {
df2 <- .x %>%
filter(date >= make_date(year(first(date)), 1, 1),
date <= make_date(year(first(date)), 1, 31))
df3 <- .x %>%
filter(date >= make_date(year(first(date)), 7, 1),
date <= make_date(year(first(date)), 7, 31))
list(df2, df3)
} )
Another option is to create another column with 'year' same (considering the leap
year as well)
library(stringr)
df1 <- df %>%
mutate(date1 = ymd(str_replace(date, '^\\d{4}', '2020')))
and then use the OP's code to subset on 'date1'