I am trying to identify data frame columns where the columns have a single character value tree
.
Here is an example dataset.
df <- data.frame(id = c(1,2,3,4,5),
var.1 = c(5,6,7,"tree",4),
var.2 = c("tree","tree","tree","tree","tree"),
var.3 = c(4,5,8,9,1))
> df
id var.1 var.2 var.3
1 1 5 tree 4
2 2 6 tree 5
3 3 7 tree 8
4 4 tree tree 9
5 5 4 tree 1
I would flag the Var.2
variable since it has all "tree
values in it.
flagged [1] "var.2"
Any ideas? Thanks!
CodePudding user response:
Using dplyr, you could do
flagged <- df %>%
select(where(~n_distinct(.x) == 1 && unique(.x) == "tree")) %>%
names()
where you select all columns that only have one distinct value which equals "tree", and then extract the column names.
CodePudding user response:
For each column, check if all elements equal the first element.
df <- data.frame(id = c(1,2,3,4,5),
var.1 = c(5,6,7,"tree",4),
var.2 = c("tree","tree","tree","tree","tree"),
var.3 = c(4,5,8,9,1))
names(df)[sapply(df, function(x) all(x == x[1]))]
#> [1] "var.2"
Created on 2022-02-17 by the reprex package (v2.0.1)