Part of my table looks like this:
X | Y | Channel | Well |
---|---|---|---|
123 | 123 | 1 | B3 |
123 | 123 | 2 | B3 |
123 | 123 | 1 | B4 |
123 | 123 | 2 | B4 |
123 | 123 | 1 | B5 |
123 | 123 | 2 | B5 |
123 | 123 | 3 | B5 |
123 | 123 | 1 | B6 |
123 | 123 | 2 | B6 |
123 | 123 | 3 | B6 |
What I want to do is to filter out
- channel no 2 for the well that has only 2 channels
- channel no 3 that has only 3 channels
For example, for B3 and B4, channel 2 has to be removed, for B5 and B6, channel 3 has to be removed.
Thank you so much!
CodePudding user response:
Here is one option using tidyverse
, where we create a new helper column to count the number of distinct channels (in case of duplicates). Then, we can specify the 2 conditions in a new column, which will be used to filter
out the required rows. Then, we can just select the original columns (i.e., select(names(df))
).
library(tidyverse)
df %>%
group_by(Well) %>%
mutate(n = n_distinct(Channel),
remove = case_when(n == 2 & Channel == 2 ~ TRUE,
n == 3 & Channel == 3 ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup %>%
filter(!remove) %>%
select(names(df))
Output
X Y Channel Well
<int> <int> <int> <chr>
1 123 123 1 B3
2 123 123 1 B4
3 123 123 1 B5
4 123 123 2 B5
5 123 123 1 B6
6 123 123 2 B6
Or could be written shorter as:
df %>%
group_by(Well) %>%
filter(!(n_distinct(Channel) == 2 & Channel == 2) &
!(n_distinct(Channel) == 3 & Channel == 3)) %>%
ungroup
Or in base R, you could do something like this:
agg <- aggregate(data = df, cbind(n = Channel) ~ Well, function(x) n = length(unique(x)))
result <- subset(merge(df, agg, by = "Well", all = TRUE), !(n == 2 & Channel == 2) &
!(n == 3 & Channel == 3), select = -n)
Data
df <- structure(list(X = c(123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 123L, 123L), Y = c(123L, 123L, 123L, 123L, 123L, 123L,
123L, 123L, 123L, 123L), Channel = c(1L, 2L, 1L, 2L, 1L, 2L,
3L, 1L, 2L, 3L), Well = c("B3", "B3", "B4", "B4", "B5", "B5",
"B5", "B6", "B6", "B6")), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
If the rule is to remove the N-th channel in a Well
that has N channels, then the general method is
library(dplyr)
df %>%
group_by(Well) %>%
filter(Channel != n_distinct(Channel)) %>%
ungroup()
# # A tibble: 6 × 4
# X Y Channel Well
# <int> <int> <int> <chr>
# 1 123 123 1 B3
# 2 123 123 1 B4
# 3 123 123 1 B5
# 4 123 123 2 B5
# 5 123 123 1 B6
# 6 123 123 2 B6