How to remove NA from certain columns only using R-CodePudding

I got data which has many columns but I want to remove NAs values from specific columns If i got a df such as below

structure(list(id = c(1, 1, 2, 2), admission = c("2001/01/01", 
"2001/03/01", "NA", "2005/01/01"), discharged = c("2001/01/07", 
"NA", "NA", "2005/01/03")), class = "data.frame", row.names = c(NA, 
-4L))

and I want to exclude the records for each id which has NAs in it and get the df such as below

structure(list(id2 = c(1, 2), admission2 = c("2001/01/01", "2005/01/01"
), discharged2 = c("2001/01/07", "2005/01/03")), class = "data.frame", row.names = c(NA, 
-2L))

CodePudding user response：

The NAs in the data are "NA". Convert to NA with is.na and then use na.omit

is.na(df1) <- df1 == "NA"
df1 <- na.omit(df1)
df1
  id  admission discharged
1  1 2001/01/01 2001/01/07
4  2 2005/01/01 2005/01/03

If it is specific columns, use

df1[complete.cases(df1[c("admission", "discharged")]),]

CodePudding user response：

Base R option:

character "NA" to NA
then use complete.cases

df[df=="NA"] <- NA
dfNA <- complete.cases(df)
df[dfNA,]

  id  admission discharged
1  1 2001/01/01 2001/01/07
4  2 2005/01/01 2005/01/03

CodePudding user response：

We can make like this too:

library(tidyr)
df1[df1=="NA"] <- NA
df1_new <- df1 %>% drop_na()
df1_new
  id  admission discharged
1  1 2001/01/01 2001/01/07
2  2 2005/01/01 2005/01/03