I have a data frame like this:
city year value
<chr> <dbl> <dbl>
1 la 1 NA
2 la 2 NA
3 la 3 NA
4 la 4 20
5 la 5 25
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
I would like to remove any cities that don't have a complete 5 years worth of data. So in this case, I'd like to remove all rows for city la despite the fact that there is data for years 4 and 5, resulting in the following data frame:
city year value
<chr> <dbl> <dbl>
1 nyc 1 18
2 nyc 2 29
3 nyc 3 24
4 nyc 4 17
5 nyc 5 30
Is this possible? Thanks in advance.
CodePudding user response:
In Base R:
subset(df, !ave(value, city, FUN = anyNA))
city year value
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
in Tidyverse
df %>%
group_by(city) %>%
filter(!anyNA(value))
# A tibble: 5 x 3
# Groups: city [1]
city year value
<chr> <int> <int>
1 nyc 1 18
2 nyc 2 29
3 nyc 3 24
4 nyc 4 17
5 nyc 5 30
or even
df %>%
group_by(city) %>%
filter(all(!is.na(value)))
CodePudding user response:
Another base R option with ave
> subset(df, !is.na(ave(value, city)))
city year value
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
or a data.table
one
> library(data.table)
> setDT(df)[, .SD[!anyNA(value)], city]
city year value
1: nyc 1 18
2: nyc 2 29
3: nyc 3 24
4: nyc 4 17
5: nyc 5 30