I have the following structure of data:
df <- data.frame(year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1980, 1981, 1982, 1983, 1984),
id = c(1,1,1,1,1,2,2,2,2,3,3,3,3,3),
value = c(4,3,5,8,9,5,1,5,6,4,5,6,3,2))
The data base contains observations of each individual (ID = 1, 2 and 3) for the years 1980 to 1984. However, one individual (id = 2) has one year observation missing. I would like to identify that individual and drop it from my data frame.
So the expected output would be the following:
year id value
1 1980 1 4
2 1981 1 3
3 1982 1 5
4 1983 1 8
5 1984 1 9
6 1980 3 4
7 1981 3 5
8 1982 3 6
9 1983 3 3
10 1984 3 2
I am starting by counting the observation for each ID but then I do not know how to tell R to select those rows of ID with 5 observations (5 = maximum number of years of the period studied):
summary <- df %>%
group_by(id) %>%
summarise(headcount = n())
CodePudding user response:
new_df <- df %>% group_by(id) %>% filter(length(id)>4)
CodePudding user response:
with n_distinct
:
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(year) >= 5)
# A tibble: 10 × 3
# Groups: id [2]
year id value
<dbl> <dbl> <dbl>
1 1980 1 4
2 1981 1 3
3 1982 1 5
4 1983 1 8
5 1984 1 9
6 1980 3 4
7 1981 3 5
8 1982 3 6
9 1983 3 3
10 1984 3 2