filter by specific category in r-CodePudding

I have a specific filtering question. Here is how my sample dataset looks like:

df <- data.frame(id = c(1,2,3,3,4,5),
                 cat= c("A","A","A","B","B","B"))
> df
  id cat
1  1     A
2  2     A
3  3     A
4  3     B
5  4     B
6  5     B

Grouping by id, when the cat has multiple categories, I would only filter cat A. So the desired output would be:

> df.1
  id cat
1  1   A
2  2   A
3  3   A
4  4   B
5  5   B

Any ideas?

Thanks!

CodePudding user response：

In this example you can take the first item from the group. In other situations you may need to reorder arrange before.

(using dplyr)

df %>% group_by(id) %>% summarise(cat = first(cat))

CodePudding user response：

Base R:

aggregate(
  df$cat, 
  by = list(id = df$id), 
  FUN = \(x) {
    unx <- unique(x)
    if (length(unx) > 1) 'A' else unx
  }
)
#   id x
# 1  1 A
# 2  2 A
# 3  3 A
# 4  4 B
# 5  5 B

CodePudding user response：

If there are only two groups in cat, we can use the following logic:

df %>%
  group_by(id) %>%
  filter(! (n() == 2 & cat == "B"))

# A tibble: 5 x 2
# Groups:   id [5]
     id cat  
  <dbl> <chr>
1     1 A    
2     2 A    
3     3 A    
4     4 B    
5     5 B

When there are multiple other letters possible

df <- data.frame(id = c(1,2,3,3,4,5,6,6,6,7),
                 cat= c("A","A","A","B","B","B", "A", "B", "C","D"))
df %>%
  group_by(id) %>%
  filter(! (n() >= 2 & cat %in% LETTERS[2:26]))
# A tibble: 7 x 2
# Groups:   id [7]
     id cat  
  <dbl> <chr>
1     1 A    
2     2 A    
3     3 A    
4     4 B    
5     5 B    
6     6 A    
7     7 D

Explanation: n() gives the current group size. When that condition is met, we filter for everything that is not "B".

CodePudding user response：

One approach with dplyr. After grouping by id, filter where there is only one row per id or cat is "A".

library(dplyr)

df %>%
  group_by(id) %>%
  filter(n() == 1 | cat == "A")

Output

     id cat  
  <dbl> <chr>
1     1 A    
2     2 A    
3     3 A    
4     4 B    
5     5 B

Also, if it is possible to have the same cat repeated within a single id, you can filter where the number of distinct cat is 1 (or keep if cat is "A"):

df %>%
  group_by(id) %>%
  filter(n_distinct(cat) == 1 | cat == "A")

CodePudding user response：

Using base R

 subset(df, cat == 'A'|id %in% names(which(table(id) == 1)))
  id cat
1  1   A
2  2   A
3  3   A
5  4   B
6  5   B