Home > database >  How do I select columns of a dataframe with character values in R based on the values of the column?
How do I select columns of a dataframe with character values in R based on the values of the column?

Time:02-04

I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to exclude all the columns with all 0 values. So in the example dataframe below, I want to create a new dataframe with columns x2 through x5. How do I do this in R when the values are characters and not numeric?

#   x1 x2 x3 x4 x5
# 1  0  0  1 1  1
# 2  0  ?  1 0  1
# 3  0  0  1 0  1
# 4  0  ?  1 1  0
# 5  0  0  1 ?  1

CodePudding user response:

You could select columns where not all values are equal to 0 like this:

library(dplyr)
df %>%
  select(where(~!all(. == "0")))
#>   x2 x3 x4 x5
#> 1  0  1  1  1
#> 2  ?  1  0  1
#> 3  0  1  0  1
#> 4  ?  1  1  0
#> 5  0  1  ?  1

Created on 2023-02-04 with reprex v2.0.2

CodePudding user response:

You can use colSums to count the number of non-zero values in each column, and then subset the data frame based on the columns with non-zero counts:

df[, colSums(df == "1") > 0 | colSums(df != "?") == 0]

This will give you a new data frame with only the columns with at least one "1" value. Note that df == "1" will create a logical matrix with TRUE values where the entries are "1" and FALSE otherwise, and colSums will sum up the values in each column, giving the number of non-zero entries in that column.

  • Related