Home > Software engineering >  Remove last row of data frame until reaching a row that does not have an NA
Remove last row of data frame until reaching a row that does not have an NA

Time:04-06

I have a dataframe that contains NA values, and I want to remove some rows that have an NA (i.e., not complete cases). However, I only want to remove rows at the end of the dataframe. So, I want to keep any rows that have an NA that are not at the end of the dataframe. What is the most efficient way to remove the ending rows with NAs without using a row index?

Data

df <- structure(list(var1 = 1:15, var2 = c(3, 6, 3, NA, 2, NA, 3, 4, 
2, NA, 4, 2, 45, 2, 1), var3 = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 15L, 16L, NA, NA, NA, NA), var4 = c(NA, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, NA)), class = "data.frame", row.names = c(NA, 
-15L))

Expected Output

So, in this example, I removed rows 12 through 15 since they have an NA and row 11 does not have an NA.

   var1 var2 var3 var4
1     1    3    6   NA
2     2    6    7    7
3     3    3    8    8
4     4   NA    9    9
5     5    2   10   10
6     6   NA   11   11
7     7    3   12   12
8     8    4   13   13
9     9    2   14   14
10   10   NA   15   15
11   11    4   16   16

I know that if I wanted to remove the first row of data frame until reaching the first row with no NA, then I could do the following:

df[cumsum(complete.cases(df)) != 0,]

CodePudding user response:

We could use na.trim from zoo package:

library(zoo)
library(dplyr)

df %>% 
  slice(1:nrow(na.trim(df, "right", is.na = "any")))
   var1 var2 var3 var4
1     1    3    6   NA
2     2    6    7    7
3     3    3    8    8
4     4   NA    9    9
5     5    2   10   10
6     6   NA   11   11
7     7    3   12   12
8     8    4   13   13
9     9    2   14   14
10   10   NA   15   15
11   11    4   16   16

CodePudding user response:

To remove the last rows, you can use rev in the same approach. So, we put the output from complete.cases in reverse order, so that we can calculate cumsum starting with the end of the dataframe. Then, we put the cumsum back in the original order (the reason for the second rev). Now, we can remove the rows with 0 (i.e., rows that contain an NA).

df[rev(cumsum(rev(complete.cases(df)))) != 0,]

Or with dplyr:

library(dplyr)

df %>% 
  filter(rev(cumsum(rev(complete.cases(.)))) != 0)

Output

   var1 var2 var3 var4
1     1    3    6   NA
2     2    6    7    7
3     3    3    8    8
4     4   NA    9    9
5     5    2   10   10
6     6   NA   11   11
7     7    3   12   12
8     8    4   13   13
9     9    2   14   14
10   10   NA   15   15
11   11    4   16   16
  • Related