Home > Software engineering >  Remove rows conditionally based on NA values in other rows
Remove rows conditionally based on NA values in other rows

Time:08-24

I have a data frame like this:

 city   year value
   <chr> <dbl> <dbl>
 1 la        1    NA
 2 la        2    NA
 3 la        3    NA
 4 la        4    20
 5 la        5    25
 6 nyc       1    18
 7 nyc       2    29
 8 nyc       3    24
 9 nyc       4    17
10 nyc       5    30

I would like to remove any cities that don't have a complete 5 years worth of data. So in this case, I'd like to remove all rows for city la despite the fact that there is data for years 4 and 5, resulting in the following data frame:

 city   year value
  <chr> <dbl> <dbl>
1 nyc       1    18
2 nyc       2    29
3 nyc       3    24
4 nyc       4    17
5 nyc       5    30

Is this possible? Thanks in advance.

CodePudding user response:

In Base R:

subset(df, !ave(value, city, FUN = anyNA))

   city year value
6   nyc    1    18
7   nyc    2    29
8   nyc    3    24
9   nyc    4    17
10  nyc    5    30

in Tidyverse

df %>%
  group_by(city) %>%
  filter(!anyNA(value))

# A tibble: 5 x 3
# Groups:   city [1]
  city   year value
  <chr> <int> <int>
1 nyc       1    18
2 nyc       2    29
3 nyc       3    24
4 nyc       4    17
5 nyc       5    30

or even

df %>%
   group_by(city) %>%
   filter(all(!is.na(value)))

CodePudding user response:

Another base R option with ave

> subset(df, !is.na(ave(value, city)))
   city year value
6   nyc    1    18
7   nyc    2    29
8   nyc    3    24
9   nyc    4    17
10  nyc    5    30

or a data.table one

> library(data.table)

> setDT(df)[, .SD[!anyNA(value)], city]
   city year value
1:  nyc    1    18
2:  nyc    2    29
3:  nyc    3    24
4:  nyc    4    17
5:  nyc    5    30
  • Related