This is a sample dataset below:
A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|
123 | xyz | abc | xyz | abc | xyz | abc | xyz |
123 | xyz | abc | xyz | abc | NA | abc | xyz |
123 | NA | abc | xyz | abc | xyz | abc | xyz |
123 | xyz | NA | xyz | abc | xyz | abc | abc |
345 | lmn | NA | xyz | abc | xyz | abc | efg |
345 | lmn | rst | xyz | abc | xyz | abc | NA |
567 | lmn | rst | xyz | abc | xyz | abc | NA |
567 | lmn | rst | NA | abc | xyz | abc | rst |
567 | lmn | rst | xyz | abc | xyz | NA | rst |
I want to create a column called "filter" where, if:
- the row does not have a NA, then 1
- the row has a NA from column B to D, then 2
- the row has a NA from column F to H, then 3
I was thinking of a mutate
and case_when
code, but I do not know how to go about it.
CodePudding user response:
if_any()
and if_all()
from dplyr
can apply the same predicate function to a selection of columns and combine the results into a single logical vector.
library(dplyr)
df %>%
mutate(filter = case_when(
!if_any(A:H, is.na) ~ 1L,
if_any(B:D, is.na) ~ 2L,
if_any(F:H, is.na) ~ 3L
))
# # A tibble: 9 × 9
# A B C D E F G H filter
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
# 1 123 xyz abc xyz abc xyz abc xyz 1
# 2 123 xyz abc xyz abc NA abc xyz 3
# 3 123 NA abc xyz abc xyz abc xyz 2
# 4 123 xyz NA xyz abc xyz abc abc 2
# 5 345 lmn NA xyz abc xyz abc efg 2
# 6 345 lmn rst xyz abc xyz abc NA 3
# 7 567 lmn rst xyz abc xyz abc NA 3
# 8 567 lmn rst NA abc xyz abc rst 2
# 9 567 lmn rst xyz abc xyz NA rst 3
Data
df <- read.table(text = "A B C D E F G H
123 xyz abc xyz abc xyz abc xyz
123 xyz abc xyz abc NA abc xyz
123 NA abc xyz abc xyz abc xyz
123 xyz NA xyz abc xyz abc abc
345 lmn NA xyz abc xyz abc efg
345 lmn rst xyz abc xyz abc NA
567 lmn rst xyz abc xyz abc NA
567 lmn rst NA abc xyz abc rst
567 lmn rst xyz abc xyz NA rst", header = TRUE)
CodePudding user response:
You can use c_across
after rowwise
to specify the columns.
library(dplyr)
df %>% rowwise() %>%
mutate(filter = case_when(sum(is.na(c_across(B:H))) == 0 ~ "1",
sum(is.na(c_across(B:D))) >= 1 ~ "2",
sum(is.na(c_across(F:H))) >= 1 ~ "3")) %>%
ungroup()
# A tibble: 9 × 9
A B C D E F G H filter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 123 xyz abc xyz abc xyz abc xyz 1
2 123 xyz abc xyz abc NA abc xyz 3
3 123 NA abc xyz abc xyz abc xyz 2
4 123 xyz NA xyz abc xyz abc abc 2
5 345 lmn NA xyz abc xyz abc efg 2
6 345 lmn rst xyz abc xyz abc NA 3
7 567 lmn rst xyz abc xyz abc NA 3
8 567 lmn rst NA abc xyz abc rst 2
9 567 lmn rst xyz abc xyz NA rst 3