Home > Mobile >  I want to delete the IDs that have no information in the remaining columns
I want to delete the IDs that have no information in the remaining columns

Time:05-05

Here is a representation of my dataset:

Number<-c(1:10)
AA<-c(head(LETTERS,4), rep(NA,6))
BB<-c(head(letters,6), rep(NA,4))
CC<-c(1:6, rep(NA,4))
DD<-c(10:14, rep(NA,5))
EE<-c(3:8, rep(NA,4))
FF<-c(6:1, rep(NA,4))
mydata<-data.frame(Number,AA,BB,CC,DD,EE,FF)

I want to delete all the IDs (Number) that have no information in the remaining columns, automatically. I want to tell the function that if there is a value in Number but there is only NA in all the remaining columns, delete the row. I must have the dataframe below:

   Number   AA   BB CC DD EE FF
1       1    A    a  1 10  3  6
2       2    B    b  2 11  4  5
3       3    C    c  3 12  5  4
4       4    D    d  4 13  6  3
5       5 <NA>    e  5 14  7  2
6       6 <NA>    f  6 NA  8  1

CodePudding user response:

Another possible base R solution:

mydata[rowSums(is.na(mydata[,-1])) != ncol(mydata[,-1]), ]

Output

  Number   AA BB CC DD EE FF
1      1    A  a  1 10  3  6
2      2    B  b  2 11  4  5
3      3    C  c  3 12  5  4
4      4    D  d  4 13  6  3
5      5 <NA>  e  5 14  7  2
6      6 <NA>  f  6 NA  8  1

Or we could use apply:

mydata[!apply(mydata[,-1], 1, function(x) all(is.na(x))),]

CodePudding user response:

We can use if_all/if_all

library(dplyr)
mydata %>% 
   filter(if_any(-Number, complete.cases))

-output

  Number   AA BB CC DD EE FF
1      1    A  a  1 10  3  6
2      2    B  b  2 11  4  5
3      3    C  c  3 12  5  4
4      4    D  d  4 13  6  3
5      5 <NA>  e  5 14  7  2
6      6 <NA>  f  6 NA  8  1

or

mydata %>%
   filter(!if_all(-Number, is.na))

Or with base R

 subset(mydata, rowSums(!is.na(mydata[-1])) >0 )
  Number   AA BB CC DD EE FF
1      1    A  a  1 10  3  6
2      2    B  b  2 11  4  5
3      3    C  c  3 12  5  4
4      4    D  d  4 13  6  3
5      5 <NA>  e  5 14  7  2
6      6 <NA>  f  6 NA  8  1

CodePudding user response:

A possible solution, using janitor::remove_empty:

library(dplyr)
library(janitor)

inner_join(mydata, remove_empty(mydata[-1], which = "rows"))

#> Joining, by = c("AA", "BB", "CC", "DD", "EE", "FF")
#>   Number   AA BB CC DD EE FF
#> 1      1    A  a  1 10  3  6
#> 2      2    B  b  2 11  4  5
#> 3      3    C  c  3 12  5  4
#> 4      4    D  d  4 13  6  3
#> 5      5 <NA>  e  5 14  7  2
#> 6      6 <NA>  f  6 NA  8  1

CodePudding user response:

Try this:

df <- df[,colSums(is.na(df))<nrow(df)]

This makes a copy of your data though. If you have a large dataset then you can use:

Filter(function(x)!all(is.na(x)), df)

and depending on your approach you can use

library(data.table)
DT <- as.data.table(df)
DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]

If you want to use a data.table which is usually a pretty solid go-to

  • Related