I have the following data with two columns and 15 rows:
data_1 <- structure(list(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130,
NA, 925, NA, 181, 182, 188, NA), column_2 = c(7, NA, 1, 1, 1,
3, 7, NA, 1, NA, 1, NA, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
column_1 | column_2 | |
---|---|---|
1 | 120 | 7 |
2 | 130 | NA |
3 | NA | 1 |
4 | NA | 1 |
5 | NA | 1 |
6 | 130 | 3 |
7 | 182 | 7 |
8 | 130 | NA |
9 | NA | 1 |
10 | 925 | NA |
11 | NA | 1 |
12 | 181 | NA |
13 | 182 | 1 |
14 | 188 | 1 |
15 | NA | 1 |
- By using filters, I would like to keep the oberservations with the following values in column_1:
NA
, 130, 181, 182, 188 - Furthermore, I would like to remove all observations with the entry 7 in column_2
So far, this works by the following code:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7)
Now I want to add an additional filter: If the value is 130 in column_1 and in column_2 it is a NA
, then remove the oberservation (so the rows 2 and 8 in data_1). How could I do this? What are the best ways to achieve this conditional filter? I have tried the following commands so far, which do not lead to the desired result:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2)))
The result here is that only the entry 130, 3 is kept.
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2), TRUE ~ is.na(column_2)))
Now two entries remain: 130, 3 and 181, NA
I have also tried the following two commands:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(if (column_2 == 130) !is.na(column_2))
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% {if (column_2 == 130) filter(., !is.na(column_2))}
CodePudding user response:
Are you looking for something like this?
library(tidyverse)
data_1 |>
filter(case_when(
is.na(column_1) ~ T,
column_1 == 130 & is.na(column_2 ) ~ F,
column_2 == 7 ~ F,
column_1 %in% c(130, 181, 182, 188) ~ T,
T ~ F
))
#> # A tibble: 10 x 2
#> column_1 column_2
#> <dbl> <dbl>
#> 1 NA 1
#> 2 NA 1
#> 3 NA 1
#> 4 130 3
#> 5 NA 1
#> 6 NA 1
#> 7 181 NA
#> 8 182 1
#> 9 188 1
#> 10 NA 1
I just added all of your conditions to one big case_when
. Make sure to map the statements to T
and F
so that the filter works correctly. In this case, when the condition is mapped to T
you will keep the row and when it is F
you will remove the row.
CodePudding user response:
I would only add that structure(list()) may be needlessly high level here unless it is done for another reason. Simpler would be:
data.frame(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA),
column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1)))
# or
tibble::tibble(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA),
column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1))