I have a specific filtering question. Here is how my sample dataset looks like:
df <- data.frame(id = c(1,2,3,3,4,5),
cat= c("A","A","A","B","B","B"))
> df
id cat
1 1 A
2 2 A
3 3 A
4 3 B
5 4 B
6 5 B
Grouping by id
, when the cat
has multiple categories, I would only filter cat A
. So the desired output would be:
> df.1
id cat
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
Any ideas?
Thanks!
CodePudding user response:
In this example you can take the first item from the group. In other situations you may need to reorder arrange
before.
(using dplyr
)
df %>% group_by(id) %>% summarise(cat = first(cat))
CodePudding user response:
Base R:
aggregate(
df$cat,
by = list(id = df$id),
FUN = \(x) {
unx <- unique(x)
if (length(unx) > 1) 'A' else unx
}
)
# id x
# 1 1 A
# 2 2 A
# 3 3 A
# 4 4 B
# 5 5 B
CodePudding user response:
If there are only two groups in cat
, we can use the following logic:
df %>%
group_by(id) %>%
filter(! (n() == 2 & cat == "B"))
# A tibble: 5 x 2
# Groups: id [5]
id cat
<dbl> <chr>
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
When there are multiple other letters possible
df <- data.frame(id = c(1,2,3,3,4,5,6,6,6,7),
cat= c("A","A","A","B","B","B", "A", "B", "C","D"))
df %>%
group_by(id) %>%
filter(! (n() >= 2 & cat %in% LETTERS[2:26]))
# A tibble: 7 x 2
# Groups: id [7]
id cat
<dbl> <chr>
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
6 6 A
7 7 D
Explanation: n()
gives the current group size. When that condition is met, we filter for everything that is not "B".
CodePudding user response:
One approach with dplyr
. After grouping by id
, filter
where there is only one row per id
or cat
is "A".
library(dplyr)
df %>%
group_by(id) %>%
filter(n() == 1 | cat == "A")
Output
id cat
<dbl> <chr>
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
Also, if it is possible to have the same cat
repeated within a single id
, you can filter
where the number of distinct cat
is 1 (or keep if cat
is "A"):
df %>%
group_by(id) %>%
filter(n_distinct(cat) == 1 | cat == "A")
CodePudding user response:
Using base R
subset(df, cat == 'A'|id %in% names(which(table(id) == 1)))
id cat
1 1 A
2 2 A
3 3 A
5 4 B
6 5 B