I would like to know how to get rows based on missing values on multiple columns.
Can is.na()
be used for this purpose?
My dataset looks like this:
NO | grade | level | score | class |
---|---|---|---|---|
1 | A | B | C | D |
2 | NA | A | NA | NA |
3 | i | j | NA | NA |
4 | W | NA | NA | NA |
5 | e | f | g | h |
6 | NA | A | NA | S |
7 | NA | NA | NA | NA |
8 | NA | NA | NA | B |
Firstly, I would like to get like below, where 3 columns have NA
:
NO | grade | level | score | class |
---|---|---|---|---|
2 | NA | A | NA | NA |
4 | W | NA | NA | NA |
8 | NA | NA | NA | B |
Secondly, I would like to get like below, where 2 columns have NA
:
NO | grade | level | score | class |
---|---|---|---|---|
3 | i | j | NA | NA |
6 | NA | A | NA | S |
CodePudding user response:
is.na
is indeed useful, and we could use it with rowSums
, filter
, and across
:
(1)
library(dplyr)
df |>
filter(rowSums(is.na(across(grade:class))) == 3)
Output:
# A tibble: 3 × 5
NO grade level score class
<dbl> <chr> <chr> <chr> <chr>
1 2 NA A NA NA
2 4 W NA NA NA
3 8 NA NA NA B
(2)
library(dplyr)
df |>
filter(rowSums(is.na(across(grade:class))) == 2)
Output:
# A tibble: 2 × 5
NO grade level score class
<dbl> <chr> <chr> <chr> <chr>
1 3 i j NA NA
2 6 NA A NA S
CodePudding user response:
You could use rowSums(!is.na(across(...))) == n
to select rows which have exact n
non-missing values.
library(dplyr)
df %>%
filter(rowSums(!is.na(across(-NO))) == 1)
This answer is similar with (and few minutes earlier than) @harre's but has opposite logic. We select rows which have exact n
missing & non-missing values respectively.
CodePudding user response:
In base R, you can use rowSums
:
df[rowSums(is.na(df)) == 3, ]
NO grade level score class
2 2 <NA> A <NA> <NA>
4 4 W <NA> <NA> <NA>
8 8 <NA> <NA> <NA> B
CodePudding user response:
Assuming that NO
is always not missing, you could try:
library(dplyr)
df %>%
filter(rowSums(!is.na(.)) == 2)
Output:
NO grade level score class
1 2 <NA> A <NA> <NA>
2 4 W <NA> <NA> <NA>
3 8 <NA> <NA> <NA> B
CodePudding user response:
One possible solution:
lapply(3:2, \(x, y) df[x==y,], rowSums(is.na(df[-1])))
[[1]]
NO grade level score class
2 2 <NA> A <NA> <NA>
4 4 W <NA> <NA> <NA>
8 8 <NA> <NA> <NA> B
[[2]]
NO grade level score class
3 3 i j <NA> <NA>
6 6 <NA> A <NA> S