I am trying to drop rows with NA
s. The NAs need to be in all the columns I specify. For example, if I specify Tickets
, Group
then there needs to be NA
s in both columns for each row to dictate the row be dropped. I tried doing
df %>% drop_na(Tickets, Group)
but got an error saying unexpected ,
and )
. Basically, there are no NAs in City
, State
, or Date
, and I want to remove all the rows with NA
s in ALL the other columns.
City | State | Date | Tickets | Group |
---|---|---|---|---|
Chicago | IL | 2021-01-01 | NA | NA |
Chicago | IL | 2021-02-01 | NA | NA |
Chicago | IL | 2021-03-01 | 4 | NA |
Chicago | IL | 2021-03-01 | 3 | 1 |
This is what I want:
City | State | Date | Tickets | Group |
---|---|---|---|---|
Chicago | IL | 2021-03-01 | 4 | NA |
Chicago | IL | 2021-03-01 | 3 | 1 |
TLDR: I am trying to drop rows which have an NA
value in all specified columns.
I'd appreciate help with this.
CodePudding user response:
Using base R
subset(df1, rowSums(is.na(df1[c("Tickets", "Group")])) < 2)
City State Date Tickets Group
3 Chicago IL 2021-03-01 4 NA
4 Chicago IL 2021-03-01 3 1
Or with if_any
in dplyr
library(dplyr)
df1 %>%
filter(if_any(c(Tickets, Group), complete.cases))
City State Date Tickets Group
1 Chicago IL 2021-03-01 4 NA
2 Chicago IL 2021-03-01 3 1
data
df1 <- structure(list(City = c("Chicago", "Chicago", "Chicago", "Chicago"
), State = c("IL", "IL", "IL", "IL"), Date = c("2021-01-01",
"2021-02-01", "2021-03-01", "2021-03-01"), Tickets = c(NA, NA,
4L, 3L), Group = c(NA, NA, NA, 1L)), class = "data.frame",
row.names = c(NA,
-4L))
CodePudding user response:
Use filter
.
library(dplyr)
df %>% filter(!is.na(Tickets) | !is.na(Group))
# City State Date Tickets Group
#1 Chicago IL 2021-03-01 4 NA
#2 Chicago IL 2021-03-01 3 1
CodePudding user response:
We could use negated if_all
:
library(dplyr)
df %>%
filter(!if_all(c(Tickets, Group), is.na))
City State Date Tickets Group
1 Chicago IL 2021-03-01 4 NA
2 Chicago IL 2021-03-01 3 1
CodePudding user response:
Thank you for your comments, everyone. Helpful to problem solve this. I think I found a solution. To anyone wondering:
I found the rows with all NAs by slightly editing a poster above's code. I then anti-joined it with my df that has all observations to get my desired outcome.
data_NA <-
df %>%
filter(is.na(Tickets) & is.na(Group))
df <-
df %>%
anti_join(data_NA)