R: How to drop the corresponding rows if there is a 1 in columns with a column name containing a spe-CodePudding

I would like to drop a whole row if there is a 1 in any column having a name that ends with "rem" or "nocut". Sorry, I can't list the whole data set here because it's too large. Here is an example of my data:

Sequence| #1C_cut  | #2C_nocut | #1C_rem | #3C_cut | #3C_rem | #4C_nocut|
R1      |    0     |     0     |    1    |    1    |    0    |     0    |
R2      |    0     |     1     |    0    |    0    |    0    |     1    |
R3      |    1     |     0     |    0    |    0    |    0    |     0    |
R4      |    0     |     0     |    0    |    1    |    0    |     0    |
R5      |    1     |     0     |    0    |    1    |    0    |     0    |

What I want:

Sequence| #1C_cut  | #2C_nocut | #1C_rem | #3C_cut | #3C_rem | #4C_nocut|
R3      |    1     |     0     |    0    |    0    |    0    |     0    |
R4      |    0     |     0     |    0    |    1    |    0    |     0    |
R5      |    1     |     0     |    0    |    1    |    0    |     0    |

I have an idea of combining select() and filter() function in dplyr to solve this problem, but I don't know how to do that......

CodePudding user response：

Borrowing the idea from @jrcalabrese. Without manipulating the data (mutate()), you can filter() as such:

df %>%  
  filter(if_all(ends_with(c("_rem", "_nocut")), ~ .x != 1))

# A tibble: 3 × 7
  Sequence `1C_cut` `2C_nocut` `1C_rem` `3C_cut` `3C_rem` `4C_nocut`
  <chr>       <dbl>      <dbl>    <dbl>    <dbl>    <dbl>      <dbl>
1 R3              1          0        0        0        0          0
2 R4              0          0        0        1        0          0
3 R5              1          0        0        1        0          0

CodePudding user response：

One tidyverse solution would be to find all the 1s contained in variables that end with _rem or _nocut (using across(ends_with))), replace those 1s with NA using ifelse, and then use drop_na to remove those rows entirely.

This code works for the dataset you included, but one major limitation of this code is that it assumes that there are no other NA values in your dataset.

library(tidyverse)

df <- read_delim("
Sequence|1C_cut|2C_nocut|1C_rem|3C_cut|3C_rem|4C_nocut
R1|0|0|1|1|0|0
R2|0|1|0|0|0|1
R3|1|0|0|0|0|0
R4|0|0|0|1|0|0
R5|1|0|0|1|0|0
")

df %>%
  mutate(across( ( ends_with("_rem") | ends_with("_nocut") ), ~ifelse(.x == 1, NA, .x))) %>%
  drop_na() %>% as.data.frame()
#>   Sequence 1C_cut 2C_nocut 1C_rem 3C_cut 3C_rem 4C_nocut
#> 1       R3      1        0      0      0      0        0
#> 2       R4      0        0      0      1      0        0
#> 3       R5      1        0      0      1      0        0

UPDATE: Actually, it would just be better to conditionally create another variable (drop_me) and then use that to figure out whether a row should be dropped. This code uses the same across(ends_with) stuff as before, but uses it with case_when instead of ifelse.

df %>%
  mutate(drop_me = case_when(if_any( ( ends_with("_rem") | ends_with("_nocut") ), ~ .x == 1) ~ 1,
                             TRUE ~ 0)) %>%
  filter(drop_me == 0)
#> # A tibble: 3 × 8
#>   Sequence `1C_cut` `2C_nocut` `1C_rem` `3C_cut` `3C_rem` `4C_nocut` drop_me
#>   <chr>       <dbl>      <dbl>    <dbl>    <dbl>    <dbl>      <dbl>   <dbl>
#> 1 R3              1          0        0        0        0          0       0
#> 2 R4              0          0        0        1        0          0       0
#> 3 R5              1          0        0        1        0          0       0

CodePudding user response：

library(tidyverse)
df %>%
  filter(!if_any(matches("_rem|_nocut"), ~.x == 1))
# A tibble: 3 × 7
  Sequence `1C_cut` `2C_nocut` `1C_rem` `3C_cut` `3C_rem` `4C_nocut`
  <chr>       <dbl>      <dbl>    <dbl>    <dbl>    <dbl>      <dbl>
1 R3              1          0        0        0        0          0
2 R4              0          0        0        1        0          0
3 R5              1          0        0        1        0          0