Home > Software design >  Is it possible to select columns in r based on any value in the column?
Is it possible to select columns in r based on any value in the column?

Time:06-22

I want to subset my df to include only columns that include a certain value in any row.

for example, if I have:

year = c(1990,1991,1992,1993,1994,1995,1996,1997,1998,1999)
apple = c(1,4,6,8,9,9,2,4,7,4)
orange = c(7,1,5,5,2,1,7,1,3,8)
banana = c(9,9,4,8,1,3,6,7,5,9)
lemon = c(8,3,3,3,2,5,6,7,2,4)
df = data.frame(year,apple,orange,banana,lemon)

df

I want to select only the columns that have a 9 anywhere in the column so that my df would become only include the apple and banana columns.

Is this possible? All the answers I've found so far only enable selecting columns based on the column name, but I want to select based on cell values within the column. Thank you!

CodePudding user response:

We can pass a function in select within where - check whether the column is numeric and if that is numeric, check whether there are any value equal to 9. In addition can change the any(.x ==9) to 9 %in% .x.

library(dplyr)
df %>% 
  select(where(~is.numeric(.x) && any(.x == 9)))

-output

 apple banana
1      1      9
2      4      9
3      6      4
4      8      8
5      9      1
6      9      3
7      2      6
8      4      7
9      7      5
10     4      9

CodePudding user response:

base R option using Filter:

Filter(function(x) any(x == 9), df)

Output:

   apple banana
1      1      9
2      4      9
3      6      4
4      8      8
5      9      1
6      9      3
7      2      6
8      4      7
9      7      5
10     4      9

CodePudding user response:

This one is long and too verbose and the perfect answer is already provided by @akrun. Therefore here an alternative approach: What we do here is mutate across each column except year check if they contain a 9 if so then put the name of column to a new column called x,

then use any_of with select

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(-year, ~case_when(. == 9 ~ cur_column()), .names = 'new_{col}')) %>%
  unite(x, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  select(any_of(x))


   banana apple
1       9     1
2       9     4
3       4     6
4       8     8
5       1     9
6       3     9
7       6     2
8       7     4
9       5     7
10      9     4

CodePudding user response:

Probably we can use colMeans like below, where the mean value should be non-zero if there exists at least one 9 in the column

> df[colMeans(df == 9) > 0]
   apple banana
1      1      9
2      4      9
3      6      4
4      8      8
5      9      1
6      9      3
7      2      6
8      4      7
9      7      5
10     4      9
  • Related