How would I remove columns from a data frame when both rows for that column have non-zero values.
For example I want to change the following table from the following
Dogs | Cats | Snakes | Elephants |
---|---|---|---|
1 | 0 | 1 | 3 |
2 | 1 | 0 | 2 |
to the following
Cats | Snakes |
---|---|
0 | 1 |
1 | 0 |
The reason the other columns are removed is because both rows had non-zero numbers. If one of the two rows has a zero then we'd retain the entire column. It does not matter which one contains the zero.
I tried to use dyplr and if else statements but most of those are based on single conditions in the column being met.
CodePudding user response:
You may use colSums
here:
df[, colSums(df!=0) != nrow(df)]
Cats Snakes
1 0 1
2 1 0
The logic here is to retain any column such that the count of row values not equal to zero does not equal the total number of rows. Put another way, this says to retain any column having at least one zero row.
Data:
df <- data.frame(Dogs=c(1,2), Cats=c(0,1), Snakes=c(1,0), Elephants=c(3,2))
CodePudding user response:
Here are few other options -
#1. Base R Filter
Filter(function(x) any(x == 0), df)
#2. purrr::keep
purrr::keep(df, ~any(.x == 0))
#3. purrr::discard
purrr::discard(df, ~all(.x != 0))
All of which returns output as -
# Cats Snakes
#1 0 1
#2 1 0
CodePudding user response:
Here is a dplyr
solution using select
along with any
:
We just select columns that contain at least one 0 or less:
library(dplyr)
df %>%
select(where(~ any(. <= 0)))
Cats Snakes
1 0 1
2 1 0
Benchmark the so far provided answers:
mbm <- microbenchmark(
base_TimBiegeleisen = df[, colSums(df!=0) != nrow(df)],
dplyr_TarJae = df %>% select(where(~ any(. <= 0))),
base_Ronak_Shah = Filter(function(x) any(x == 0), df),
purr_keep_Ronak_Shah = purrr::keep(df, ~any(.x == 0)),
purr_discard_Ronak_Shah = purrr::discard(df, ~all(.x != 0)),
times=50
)
mbm
autoplot(mbm)