Remove groups with only one individual in R without using dplyr package-CodePudding

Consider the following dataset. The data is grouped with either one or two people per group. However, an individual may have several entries.

group<-c(1,1,1,1,2,2,3,3,3,3,4,4)
individualID<-c(1,1,2,2,3,3,5,5,6,6,7,7)
X<-rbinom(12,1,0.5)
df1<-data.frame(group,individualID,X)
> df1
   group individualID X     
1      1            1  0 
2      1            1  1 
3      1            2  1 
4      1            2  1 
5      2            3  1 
6      2            3  1 
7      3            5  1 
8      3            5  1 
9      3            6  1 
10     3            6  1 
11     4            7  0 
12     4            7  1

From the above Group 1 and group 3 have 2 individuals whereas group 2 and group 4 have 1 individual each.

> aggregate(data = df1,  individualID ~ group, function(x) length(unique(x)))
group individualID 
1 1    2
2 2    1
3 3    2
4 4    1

How can I subset the data without use of dplyr package to have only groups that have more than 1 individual. i.e. omit groups with 1 individual.

I should end up with only group 1 and group 3.

CodePudding user response：

There are more concise ways for sure, but here is the general idea.

# use your code to get the counts by group
df1_counts <- aggregate(data = df1,  individualID ~ group, function(x) length(unique(x)))

# create a vector of groups where the count is > 1
keep_groups <- df1_counts$group[df1_counts$individualID > 1]

# filter the rows to only groups you want to keep
df1[df1$group %in% keep_groups,]
#    group individualID X
# 1      1            1 0
# 2      1            1 0
# 3      1            2 1
# 4      1            2 0
# 7      3            5 1
# 8      3            5 1
# 9      3            6 0
# 10     3            6 1

CodePudding user response：

Or another option is with tidyverse - after grouping by 'group', filter the rows where the number of distinct (n_distinct) elements in 'individualID' is greater than 1

library(dplyr)
df1 %>%
    group_by(group) %>% 
    filter(n_distinct(individualID) > 1) %>%
    ungroup
# A tibble: 8 × 3
  group individualID     X
  <dbl>        <dbl> <int>
1     1            1     0
2     1            1     0
3     1            2     1
4     1            2     1
5     3            5     0
6     3            5     0
7     3            6     1
8     3            6     0

Or with subset and ave from base R

subset(df1, ave(individualID, group, FUN = function(x) length(unique(x))) > 1)
   group individualID X
1      1            1 0
2      1            1 0
3      1            2 1
4      1            2 1
7      3            5 0
8      3            5 0
9      3            6 1
10     3            6 0