Home > database >  Drop unit if in at least one column its observations are all missing
Drop unit if in at least one column its observations are all missing

Time:09-12

I have a dataframe like the one below, with units ('teams') observed over time. I want to identify and drop all teams where in at least one column all observations are NA. In this example, only team "A" would not be dropped.

df <- data.frame(team=c("A", "A", "B", "B", "C", "C"),
                 year=c(1992, 1993, 1992, 1993, 1992, 1993),
                 points=c(NA, 15, 17, 24, NA, NA),
                 assists=c(4, 7, NA, NA, 12, NA))

  team year points assists
1    A 1992     NA       4
2    A 1993     15       7
3    B 1992     17      NA
4    B 1993     24      NA
5    C 1992     NA      12
6    C 1993     NA      NA

I have a slight preference for dplyr, but any solution is welcome. I can think of doing group_by() and filter() but don't know how to do the "at least one column" part. Thanks a lot for your help!

CodePudding user response:

You could use if_any/if_all:

library(dplyr)

df %>%
  group_by(team) %>%
  filter(!if_any(everything(), ~ all(is.na(.x)))) %>%
  ungroup()

or

df %>%
  group_by(team) %>%
  filter(if_all(everything(), ~ !all(is.na(.x)))) %>%
  ungroup()

# # A tibble: 2 × 4
#   team   year points assists
#   <chr> <dbl>  <dbl>   <dbl>
# 1 A      1992     NA       4
# 2 A      1993     15       7

everything() can be skipped (but the comma remains) because it's the default of if_any/if_all. I.e.

filter(if_all(, ~ !all(is.na(.x))))

CodePudding user response:

Using base:

# get the team
x <- sapply(split(df, df$team), function(i)
  !any(colSums(is.na(i)) == nrow(i)))
x
#    A     B     C 
# TRUE FALSE FALSE 

#then filter
df[ df$team %in% names(x[ x ]), ]
#   team year points assists
# 1    A 1992     NA       4
# 2    A 1993     15       7

CodePudding user response:

You may use:

na.omit(df)

for all columns or specify columns for missing values:

na.omit(df$points)
  • Related