Home > Mobile >  How to implement a conditional filter in R dplyr filter
How to implement a conditional filter in R dplyr filter

Time:05-16

I have the following data with two columns and 15 rows:

data_1 <- structure(list(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, 
NA, 925, NA, 181, 182, 188, NA), column_2 = c(7, NA, 1, 1, 1, 
3, 7, NA, 1, NA, 1, NA, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))
column_1 column_2
1 120 7
2 130 NA
3 NA 1
4 NA 1
5 NA 1
6 130 3
7 182 7
8 130 NA
9 NA 1
10 925 NA
11 NA 1
12 181 NA
13 182 1
14 188 1
15 NA 1
  • By using filters, I would like to keep the oberservations with the following values in column_1: NA, 130, 181, 182, 188
  • Furthermore, I would like to remove all observations with the entry 7 in column_2

So far, this works by the following code:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7)

Now I want to add an additional filter: If the value is 130 in column_1 and in column_2 it is a NA, then remove the oberservation (so the rows 2 and 8 in data_1). How could I do this? What are the best ways to achieve this conditional filter? I have tried the following commands so far, which do not lead to the desired result:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2)))

The result here is that only the entry 130, 3 is kept.

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2), TRUE ~ is.na(column_2)))

Now two entries remain: 130, 3 and 181, NA

I have also tried the following two commands:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(if (column_2 == 130) !is.na(column_2))
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% {if (column_2 == 130) filter(., !is.na(column_2))}

CodePudding user response:

Are you looking for something like this?

library(tidyverse)


data_1 |>
  filter(case_when(
    is.na(column_1) ~ T,
    column_1 == 130 & is.na(column_2 ) ~ F,
    column_2 == 7 ~ F,
    column_1 %in% c(130, 181, 182, 188) ~ T,
    T ~ F
  ))
#> # A tibble: 10 x 2
#>    column_1 column_2
#>       <dbl>    <dbl>
#>  1       NA        1
#>  2       NA        1
#>  3       NA        1
#>  4      130        3
#>  5       NA        1
#>  6       NA        1
#>  7      181       NA
#>  8      182        1
#>  9      188        1
#> 10       NA        1

I just added all of your conditions to one big case_when. Make sure to map the statements to T and F so that the filter works correctly. In this case, when the condition is mapped to T you will keep the row and when it is F you will remove the row.

CodePudding user response:

I would only add that structure(list()) may be needlessly high level here unless it is done for another reason. Simpler would be:

data.frame(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA), 
           column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1)))

# or

tibble::tibble(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA), 
               column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1))
  • Related