I have a dataframe like the one below, with units ('teams') observed over time. I want to identify and drop all teams where in at least one column all observations are NA. In this example, only team "A" would not be dropped.
df <- data.frame(team=c("A", "A", "B", "B", "C", "C"),
year=c(1992, 1993, 1992, 1993, 1992, 1993),
points=c(NA, 15, 17, 24, NA, NA),
assists=c(4, 7, NA, NA, 12, NA))
team year points assists
1 A 1992 NA 4
2 A 1993 15 7
3 B 1992 17 NA
4 B 1993 24 NA
5 C 1992 NA 12
6 C 1993 NA NA
I have a slight preference for dplyr
, but any solution is welcome. I can think of doing group_by() and filter() but don't know how to do the "at least one column" part. Thanks a lot for your help!
CodePudding user response:
You could use if_any/if_all
:
library(dplyr)
df %>%
group_by(team) %>%
filter(!if_any(everything(), ~ all(is.na(.x)))) %>%
ungroup()
or
df %>%
group_by(team) %>%
filter(if_all(everything(), ~ !all(is.na(.x)))) %>%
ungroup()
# # A tibble: 2 × 4
# team year points assists
# <chr> <dbl> <dbl> <dbl>
# 1 A 1992 NA 4
# 2 A 1993 15 7
everything()
can be skipped (but the comma remains) because it's the default of if_any/if_all
. I.e.
filter(if_all(, ~ !all(is.na(.x))))
CodePudding user response:
Using base:
# get the team
x <- sapply(split(df, df$team), function(i)
!any(colSums(is.na(i)) == nrow(i)))
x
# A B C
# TRUE FALSE FALSE
#then filter
df[ df$team %in% names(x[ x ]), ]
# team year points assists
# 1 A 1992 NA 4
# 2 A 1993 15 7
CodePudding user response:
You may use:
na.omit(df)
for all columns or specify columns for missing values:
na.omit(df$points)