Home > Back-end >  Select Data - First entry set time period (1 year) R
Select Data - First entry set time period (1 year) R

Time:09-14

I have a dataset on a group of individuals that was collected starting at different times for each individual.

I need to subset the data from 1 year since their first entry, like so: myData[myDate >= "first entry" & myDate = "1 year"]

Example data:

df_date <- data.frame( Name = c("Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim",
                                "Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue"),
                       Dates = c("2010-1-1", "2010-2-2", "2010-3-5","2010-4-17","2010-5-20",
                                 "2010-6-29","2010-7-6","2010-8-9","2010-9-16","2010-10-28","2010-11-16","2010-12-28","2011-1-16","2011-2-28",
                                 "2010-4-1", "2010-5-2", "2010-6-5","2010-7-17","2010-8-20",
                                 "2010-9-29","2010-10-6","2010-11-9","2012-12-16","2011-1-28","2011-2-28","2011-3-28","2011-2-28","2011-3-28"),
                       Event = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) )

The desired output would be Jim would have data from 1/1/2010 - 12/28/2010 and Sue from 4/4/2010 - 3/28/2011 and so on. The actual dataset had > 20 samples, all starting at different times.

CodePudding user response:

Use a combination of tidyverse and lubridate functions:

library(tidyverse)
library(lubridate)

df_date %>%
  mutate(Dates = as_datetime(Dates)) %>%
  group_by(Name) %>%
  arrange(Dates, .by_group = T) %>%
  filter(Dates <= first(Dates)   duration(1, units = "year"))

CodePudding user response:

Similar to Martin C. Arnold's answer, I got another answer based on dplyr and lubridate. min(Dates) years(1) means add one year to the minimum date.

library(dplyr)
library(lubridate)

df_date2 <- df_date %>%
  mutate(Dates = ymd(Dates)) %>%
  group_by(Name) %>%
  filter(Dates <= min(Dates)   years(1)) %>%
  ungroup()
  • Related