Home > Back-end >  Select rows where only one column has a value and all other columns have NA
Select rows where only one column has a value and all other columns have NA

Time:08-15

I would like to know how to get rows based on missing values on multiple columns.

Can is.na() be used for this purpose?

My dataset looks like this:

NO grade level score class
1 A B C D
2 NA A NA NA
3 i j NA NA
4 W NA NA NA
5 e f g h
6 NA A NA S
7 NA NA NA NA
8 NA NA NA B

Firstly, I would like to get like below, where 3 columns have NA:

NO grade level score class
2 NA A NA NA
4 W NA NA NA
8 NA NA NA B

Secondly, I would like to get like below, where 2 columns have NA:

NO grade level score class
3 i j NA NA
6 NA A NA S

CodePudding user response:

is.na is indeed useful, and we could use it with rowSums, filter, and across:

(1)

library(dplyr)

df |>
  filter(rowSums(is.na(across(grade:class))) == 3)

Output:

# A tibble: 3 × 5
     NO grade level score class
  <dbl> <chr> <chr> <chr> <chr>
1     2 NA    A     NA    NA   
2     4 W     NA    NA    NA   
3     8 NA    NA    NA    B    

(2)

library(dplyr)

df |>
  filter(rowSums(is.na(across(grade:class))) == 2)

Output:

# A tibble: 2 × 5
     NO grade level score class
  <dbl> <chr> <chr> <chr> <chr>
1     3 i     j     NA    NA   
2     6 NA    A     NA    S    

CodePudding user response:

You could use rowSums(!is.na(across(...))) == n to select rows which have exact n non-missing values.

library(dplyr)

df %>%
  filter(rowSums(!is.na(across(-NO))) == 1)

This answer is similar with (and few minutes earlier than) @harre's but has opposite logic. We select rows which have exact n missing & non-missing values respectively.

CodePudding user response:

In base R, you can use rowSums:

df[rowSums(is.na(df)) == 3, ]

  NO grade level score class
2  2  <NA>     A  <NA>  <NA>
4  4     W  <NA>  <NA>  <NA>
8  8  <NA>  <NA>  <NA>     B

CodePudding user response:

Assuming that NO is always not missing, you could try:

library(dplyr)

df %>%
  filter(rowSums(!is.na(.)) == 2)

Output:

  NO grade level score class
1  2  <NA>     A  <NA>  <NA>
2  4     W  <NA>  <NA>  <NA>
3  8  <NA>  <NA>  <NA>     B

CodePudding user response:

One possible solution:

lapply(3:2, \(x, y) df[x==y,], rowSums(is.na(df[-1])))

[[1]]
  NO grade level score class
2  2  <NA>     A  <NA>  <NA>
4  4     W  <NA>  <NA>  <NA>
8  8  <NA>  <NA>  <NA>     B

[[2]]
  NO grade level score class
3  3     i     j  <NA>  <NA>
6  6  <NA>     A  <NA>     S
  • Related