I need to remove certain columns that have specific values in it.
Here is a sample dataset:
df <- data.frame(var1 = c(1,2,3,4,5),
var2 = c("a","b","c","d","e"),
var3 = c("[Name A]","[Surname B]",NA, NA, NA),
var4 = c("[Name A]","[Surname B]",NA, NA, NA))
> df
var1 var2 var3 var4
1 1 a [Name A] [Name A]
2 2 b [Surname B] [Surname B]
3 3 c <NA> <NA>
4 4 d <NA> <NA>
5 5 e <NA> <NA>
var3
and var4
have [Name A]
and [Surname B]
and then NA
s. When a column has this pattern, I need to remove it.
How can I achieve the desired dataset below:
> df1
var1 var2
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
CodePudding user response:
index <- sapply(
df,
FUN = function(x){
ifelse(
x[[1]] == "[Name A]" & x[[2]] == "[Surname B]",
# modify this ifelse to match exactly what you want, e.g. certain number of NA, remainder of values are NA, etc
FALSE,
TRUE
)
}
)
index
# var1 var2 var3 var4
# TRUE TRUE FALSE FALSE
df[,index]
# var1 var2
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
CodePudding user response:
base
:Filter()
Filter(\(x) !all(c('[Name A]', '[Surname B]', NA) %in% x), df)
dplyr
:select()
where()
library(dplyr)
df %>%
select(where(~ !all(c('[Name A]', '[Surname B]', NA) %in% .x)))
Output
# var1 var2
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e