Delete records containing more than 5 null values?-CodePudding

I would like to know how I can remove from a dataset the records that have more than 5 null values in the columns that define them. The following code allows you to delete records with any NA in any column. However, how can I modify it to do exactly what I ask? Any ideas?

df [ complete.cases (df),]

CodePudding user response：

Here is an example data frame. One of the rows has 6 NA values. We sum the NA values by row in a new column, filter where the number of NA is less than or equal to 5, then remove the new column.

df <- data.frame(a = c(1,NA,1,1),
                 b = c(1, NA, NA, 1),
                 c = c(1, NA, NA, NA),
                 d = c(1, NA, NA ,NA),
                 e = c(1, NA, NA, NA),
                 f = c(1, NA, NA, NA))

   a  b  c  d  e  f
1  1  1  1  1  1  1
2 NA NA NA NA NA NA
3  1 NA NA NA NA NA
4  1  1 NA NA NA NA

 
df %>%
    mutate(count = rowSums(is.na(df))) %>%
    filter(count <= 5) %>%
    select(-count)

  a  b  c  d  e  f
1 1  1  1  1  1  1
2 1 NA NA NA NA NA
3 1  1 NA NA NA NA

CodePudding user response：

I'm assuming you are referring to values of NA in your data indicating a missing value. NULL is returned by expressions and functions whose value is undefined. First create some reproducible data:

set.seed(42)
vals <- sample.int(1000, 250)
idx <- sample.int(250, 100)
vals[idx] <- NA
example <- as.data.frame(matrix(vals, 25))

Now compute the number of missing values by row and exclude the rows with more than 5 missing values:

na.count <- rowSums(is.na(example))
example[na.count<=5, ]