Home > Software design >  Dealing with row-wise NAs in R
Dealing with row-wise NAs in R

Time:01-19

This is a sample dataset below:

A B C D E F G H
123 xyz abc xyz abc xyz abc xyz
123 xyz abc xyz abc NA abc xyz
123 NA abc xyz abc xyz abc xyz
123 xyz NA xyz abc xyz abc abc
345 lmn NA xyz abc xyz abc efg
345 lmn rst xyz abc xyz abc NA
567 lmn rst xyz abc xyz abc NA
567 lmn rst NA abc xyz abc rst
567 lmn rst xyz abc xyz NA rst

I want to create a column called "filter" where, if:

  1. the row does not have a NA, then 1
  2. the row has a NA from column B to D, then 2
  3. the row has a NA from column F to H, then 3

I was thinking of a mutate and case_when code, but I do not know how to go about it.

CodePudding user response:

if_any() and if_all() from dplyr can apply the same predicate function to a selection of columns and combine the results into a single logical vector.

library(dplyr)

df %>%
  mutate(filter = case_when(
    !if_any(A:H, is.na) ~ 1L,
     if_any(B:D, is.na) ~ 2L,
     if_any(F:H, is.na) ~ 3L
  ))

# # A tibble: 9 × 9
#       A B     C     D     E     F     G     H     filter
#   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <int>
# 1   123 xyz   abc   xyz   abc   xyz   abc   xyz        1
# 2   123 xyz   abc   xyz   abc   NA    abc   xyz        3
# 3   123 NA    abc   xyz   abc   xyz   abc   xyz        2
# 4   123 xyz   NA    xyz   abc   xyz   abc   abc        2
# 5   345 lmn   NA    xyz   abc   xyz   abc   efg        2
# 6   345 lmn   rst   xyz   abc   xyz   abc   NA         3
# 7   567 lmn   rst   xyz   abc   xyz   abc   NA         3
# 8   567 lmn   rst   NA    abc   xyz   abc   rst        2
# 9   567 lmn   rst   xyz   abc   xyz   NA    rst        3

Data
df <- read.table(text = "A B C D E F G H
123 xyz abc xyz abc xyz abc xyz
123 xyz abc xyz abc NA abc xyz
123 NA abc xyz abc xyz abc xyz
123 xyz NA xyz abc xyz abc abc
345 lmn NA xyz abc xyz abc efg
345 lmn rst xyz abc xyz abc NA
567 lmn rst xyz abc xyz abc NA
567 lmn rst NA abc xyz abc rst
567 lmn rst xyz abc xyz NA rst", header = TRUE)

CodePudding user response:

You can use c_across after rowwise to specify the columns.

library(dplyr)

df %>% rowwise() %>% 
  mutate(filter = case_when(sum(is.na(c_across(B:H))) == 0 ~ "1",
                            sum(is.na(c_across(B:D))) >= 1 ~ "2",
                            sum(is.na(c_across(F:H))) >= 1 ~ "3")) %>% 
  ungroup()

# A tibble: 9 × 9
      A B     C     D     E     F     G     H     filter
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
1   123 xyz   abc   xyz   abc   xyz   abc   xyz   1     
2   123 xyz   abc   xyz   abc   NA    abc   xyz   3     
3   123 NA    abc   xyz   abc   xyz   abc   xyz   2     
4   123 xyz   NA    xyz   abc   xyz   abc   abc   2     
5   345 lmn   NA    xyz   abc   xyz   abc   efg   2     
6   345 lmn   rst   xyz   abc   xyz   abc   NA    3     
7   567 lmn   rst   xyz   abc   xyz   abc   NA    3     
8   567 lmn   rst   NA    abc   xyz   abc   rst   2     
9   567 lmn   rst   xyz   abc   xyz   NA    rst   3     
  • Related