I have a dataframe that contains three values: 0
, 1
, and ?
. The 0
and 1
values are character values and not numeric. I want to subset the dataframe so as to exclude all the columns with all 0
values. So in the example dataframe below, I want to create a new dataframe with columns x2
through x5
. How do I do this in R when the values are characters and not numeric?
# x1 x2 x3 x4 x5
# 1 0 0 1 1 1
# 2 0 ? 1 0 1
# 3 0 0 1 0 1
# 4 0 ? 1 1 0
# 5 0 0 1 ? 1
CodePudding user response:
You could select
columns where
not all
values are equal to 0 like this:
library(dplyr)
df %>%
select(where(~!all(. == "0")))
#> x2 x3 x4 x5
#> 1 0 1 1 1
#> 2 ? 1 0 1
#> 3 0 1 0 1
#> 4 ? 1 1 0
#> 5 0 1 ? 1
Created on 2023-02-04 with reprex v2.0.2
CodePudding user response:
You can use colSums
to count the number of non-zero values in each column, and then subset the data frame based on the columns with non-zero counts:
df[, colSums(df == "1") > 0 | colSums(df != "?") == 0]
This will give you a new data frame with only the columns with at least one "1" value. Note that df == "1"
will create a logical matrix with TRUE
values where the entries are "1" and FALSE
otherwise, and colSums
will sum up the values in each column, giving the number of non-zero entries in that column.