Home > Software engineering >  How do I drop NAs where all cells have an NA in the columns I specify?
How do I drop NAs where all cells have an NA in the columns I specify?

Time:12-06

I am trying to drop rows with NAs. The NAs need to be in all the columns I specify. For example, if I specify Tickets, Group then there needs to be NAs in both columns for each row to dictate the row be dropped. I tried doing df %>% drop_na(Tickets, Group) but got an error saying unexpected , and ). Basically, there are no NAs in City, State, or Date, and I want to remove all the rows with NAs in ALL the other columns.

City State Date Tickets Group
Chicago IL 2021-01-01 NA NA
Chicago IL 2021-02-01 NA NA
Chicago IL 2021-03-01 4 NA
Chicago IL 2021-03-01 3 1

This is what I want:

City State Date Tickets Group
Chicago IL 2021-03-01 4 NA
Chicago IL 2021-03-01 3 1

TLDR: I am trying to drop rows which have an NA value in all specified columns.

I'd appreciate help with this.

CodePudding user response:

Using base R

subset(df1, rowSums(is.na(df1[c("Tickets", "Group")])) < 2)
     City State       Date Tickets Group
3 Chicago    IL 2021-03-01       4    NA
4 Chicago    IL 2021-03-01       3     1

Or with if_any in dplyr

library(dplyr)
df1 %>% 
   filter(if_any(c(Tickets, Group), complete.cases))
     City State       Date Tickets Group
1 Chicago    IL 2021-03-01       4    NA
2 Chicago    IL 2021-03-01       3     1

data

df1 <- structure(list(City = c("Chicago", "Chicago", "Chicago", "Chicago"
), State = c("IL", "IL", "IL", "IL"), Date = c("2021-01-01", 
"2021-02-01", "2021-03-01", "2021-03-01"), Tickets = c(NA, NA, 
4L, 3L), Group = c(NA, NA, NA, 1L)), class = "data.frame", 
row.names = c(NA, 
-4L))

CodePudding user response:

Use filter.

library(dplyr)

df %>% filter(!is.na(Tickets) | !is.na(Group))
#     City State       Date Tickets Group
#1 Chicago    IL 2021-03-01       4    NA
#2 Chicago    IL 2021-03-01       3     1

CodePudding user response:

We could use negated if_all:

library(dplyr)
df %>%
  filter(!if_all(c(Tickets, Group), is.na))
   City State       Date Tickets Group
1 Chicago    IL 2021-03-01       4    NA
2 Chicago    IL 2021-03-01       3     1

CodePudding user response:

Thank you for your comments, everyone. Helpful to problem solve this. I think I found a solution. To anyone wondering:

I found the rows with all NAs by slightly editing a poster above's code. I then anti-joined it with my df that has all observations to get my desired outcome.

data_NA <-
df %>% 
filter(is.na(Tickets) & is.na(Group))

df <-
df %>%
anti_join(data_NA)

  •  Tags:  
  • r na
  • Related