Home > Software engineering >  I would like to select >= as.Date('2008-01-01 ') and the NAs
I would like to select >= as.Date('2008-01-01 ') and the NAs

Time:11-16

I have tried

  1. subset(df,df$date >= as.Date('2008-01-01'),na.rm = FALSE)
  2. subset(df,df$date >= as.Date('2008-01-01'),na.omit = FALSE)

I'm losing all the people who have NAs too. Please suggest a way to sort it out

I tried subset(df,df$date >= as.Date('2008-01-01'),na.rm = FALSE)

CodePudding user response:

If you look at the ?subset help page, it doesn't have any arguments named na.rm or na.omit. Those aren't magic keywords. They're common arguments that some (but not all) functions take, and you need to look at the function's help page to see if they work with a certain function.

Also, the point of using subset rather than just [ is that you don't have to use data$ after passing the data argument.

subset(df, date >= "2008-01-01" | is.na(date))

This should work to keep rows where the date is >= 2008-01-01 OR where the date is NA.

CodePudding user response:

Here is an example using filter from dplyr package: instead of subset:

library(dplyr)
# create tibble
dat <- tibble(x = c(rep(as.Date('2008-01-01'),10)))

# add NA to tibble       
set.seed(123)
df <- as.data.frame(lapply(dat, \(x) replace(x, sample(length(x), .3*length(x)), NA)))

# filter all  2008-01-01 and NA
df %>% 
  filter(x == "2008-01-01" | is.na(.))
  • Related