Subsetting a column where all values are the same character value in r-CodePudding

I am trying to identify data frame columns where the columns have a single character value tree.

Here is an example dataset.

df <- data.frame(id = c(1,2,3,4,5),
                 var.1 = c(5,6,7,"tree",4),
                 var.2 = c("tree","tree","tree","tree","tree"),
                 var.3 = c(4,5,8,9,1))

> df
  id var.1 var.2 var.3
1  1     5  tree     4
2  2     6  tree     5
3  3     7  tree     8
4  4  tree  tree     9
5  5     4  tree     1

I would flag the Var.2 variable since it has all "tree values in it.

flagged [1] "var.2"

Any ideas? Thanks!

CodePudding user response：

Using dplyr, you could do

flagged <- df %>%
  select(where(~n_distinct(.x) == 1 && unique(.x) == "tree")) %>%
  names()

where you select all columns that only have one distinct value which equals "tree", and then extract the column names.

CodePudding user response：

For each column, check if all elements equal the first element.

df <- data.frame(id = c(1,2,3,4,5),
                 var.1 = c(5,6,7,"tree",4),
                 var.2 = c("tree","tree","tree","tree","tree"),
                 var.3 = c(4,5,8,9,1))


names(df)[sapply(df, function(x) all(x == x[1]))]
#> [1] "var.2"

^{Created on 2022-02-17 by the reprex package (v2.0.1)}