Home > Software engineering >  Drop individuals from a data frame with not enough observations R
Drop individuals from a data frame with not enough observations R

Time:01-20

I have the following structure of data:

df <- data.frame(year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1980, 1981, 1982, 1983, 1984), 
                id = c(1,1,1,1,1,2,2,2,2,3,3,3,3,3), 
                value = c(4,3,5,8,9,5,1,5,6,4,5,6,3,2))

The data base contains observations of each individual (ID = 1, 2 and 3) for the years 1980 to 1984. However, one individual (id = 2) has one year observation missing. I would like to identify that individual and drop it from my data frame.

So the expected output would be the following:

year id value
1  1980  1     4
2  1981  1     3
3  1982  1     5
4  1983  1     8
5  1984  1     9
6  1980  3     4
7  1981  3     5
8  1982  3     6
9  1983  3     3
10 1984  3     2

I am starting by counting the observation for each ID but then I do not know how to tell R to select those rows of ID with 5 observations (5 = maximum number of years of the period studied):

summary <- df %>% 
  group_by(id) %>% 
  summarise(headcount = n())

CodePudding user response:

new_df <- df %>% group_by(id) %>% filter(length(id)>4)

CodePudding user response:

with n_distinct:

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(n_distinct(year) >= 5)

# A tibble: 10 × 3
# Groups:   id [2]
    year    id value
   <dbl> <dbl> <dbl>
 1  1980     1     4
 2  1981     1     3
 3  1982     1     5
 4  1983     1     8
 5  1984     1     9
 6  1980     3     4
 7  1981     3     5
 8  1982     3     6
 9  1983     3     3
10  1984     3     2
  • Related