I have a data set that includes a name
, date
and earliest_date
, in which some name
will have a earliest_date
. Now I want to remove all the data after the earliest_date
based on name
. And ignore those that have NA
in earliest_date
. And sicne different name
will have different earliest_date
, I am pretty sure I can't use filter()
with a set date. Any help will be much appericated.
Part of the data is below:
dput(mydata[1:10,])
structure(list(name = c("a", "b", "c",
"d", "e", "f", "g",
"a", "h", "i"), Date = structure(c(13214,
17634, 15290, 18046, 16326, 18068, 10234, 12647, 15485, 15182
), class = "Date"), earliest_date = structure(c(12647, NA, NA,
NA, NA, NA, NA, 12647, NA, 15552), class = "Date")), row.names = c(NA,
10L), class = "data.frame")
Desired output:
The first row will be removed as the Date
recorded after earliest_date
dput(mydata[2:10,])
structure(list(name = c("b", "c",
"d", "e", "f", "g",
"a", "h", "i"), Date = structure(c(17634, 15290,
18046, 16326, 18068, 10234, 12647, 15485, 15182), class = "Date"),
earliest_date = structure(c(NA, NA, NA, NA, NA, NA, 12647,
NA, 15552), class = "Date")), row.names = 2:10, class = "data.frame")
CodePudding user response:
This may helps
mydata %>%
filter(is.na(earliest_date) | Date<=earliest_date)
name Date earliest_date
1 b 2018-04-13 <NA>
2 c 2011-11-12 <NA>
3 d 2019-05-30 <NA>
4 e 2014-09-13 <NA>
5 f 2019-06-21 <NA>
6 g 1998-01-08 <NA>
7 a 2004-08-17 2004-08-17
8 h 2012-05-25 <NA>
9 i 2011-07-27 2012-07-31
CodePudding user response:
Or try:
library(data.table)
setDT(mydata)[is.na(mydata$earliest_date) | mydata$Date<=earliest_date,]