Home > Software design >  removing certain columns when those have certain values in R
removing certain columns when those have certain values in R

Time:09-28

I need to remove certain columns that have specific values in it.

Here is a sample dataset:

df <- data.frame(var1 = c(1,2,3,4,5),
                 var2 = c("a","b","c","d","e"),
                 var3 = c("[Name A]","[Surname B]",NA, NA, NA),
                 var4 = c("[Name A]","[Surname B]",NA, NA, NA))


> df
  var1 var2        var3        var4
1    1    a    [Name A]    [Name A]
2    2    b [Surname B] [Surname B]
3    3    c        <NA>        <NA>
4    4    d        <NA>        <NA>
5    5    e        <NA>        <NA>

var3 and var4 have [Name A] and [Surname B] and then NAs. When a column has this pattern, I need to remove it.

How can I achieve the desired dataset below:

> df1
  var1 var2       
1    1    a    
2    2    b 
3    3    c       
4    4    d      
5    5    e       

CodePudding user response:

index <- sapply(
  df,
  FUN = function(x){
    ifelse(
      x[[1]] == "[Name A]" & x[[2]] == "[Surname B]",
# modify this ifelse to match exactly what you want, e.g. certain number of NA, remainder of values are NA, etc
      FALSE,
      TRUE
    )
  }
)



 index
# var1  var2  var3  var4 
#  TRUE  TRUE FALSE FALSE 
 


df[,index]
#  var1 var2
# 1    1    a
# 2    2    b
# 3    3    c
# 4    4    d
# 5    5    e

CodePudding user response:

  • base: Filter()
Filter(\(x) !all(c('[Name A]', '[Surname B]', NA) %in% x), df)
  • dplyr: select() where()
library(dplyr)

df %>%
  select(where(~ !all(c('[Name A]', '[Surname B]', NA) %in% .x)))
Output
#   var1 var2
# 1    1    a
# 2    2    b
# 3    3    c
# 4    4    d
# 5    5    e
  • Related