Conditionally remove rows from data frames-CodePudding

Part of my table looks like this:

X	Y	Channel	Well
123	123	1	B3
123	123	2	B3
123	123	1	B4
123	123	2	B4
123	123	1	B5
123	123	2	B5
123	123	3	B5
123	123	1	B6
123	123	2	B6
123	123	3	B6

What I want to do is to filter out

channel no 2 for the well that has only 2 channels
channel no 3 that has only 3 channels

For example, for B3 and B4, channel 2 has to be removed, for B5 and B6, channel 3 has to be removed.

Thank you so much!

CodePudding user response：

Here is one option using tidyverse, where we create a new helper column to count the number of distinct channels (in case of duplicates). Then, we can specify the 2 conditions in a new column, which will be used to filter out the required rows. Then, we can just select the original columns (i.e., select(names(df))).

library(tidyverse)

df %>%
  group_by(Well) %>%
  mutate(n = n_distinct(Channel),
         remove = case_when(n == 2 & Channel == 2 ~ TRUE,
                            n == 3 & Channel == 3 ~ TRUE,
                            TRUE ~ FALSE)) %>%
  ungroup %>% 
  filter(!remove) %>% 
  select(names(df))

Output

      X     Y Channel Well 
  <int> <int>   <int> <chr>
1   123   123       1 B3   
2   123   123       1 B4   
3   123   123       1 B5   
4   123   123       2 B5   
5   123   123       1 B6   
6   123   123       2 B6

Or could be written shorter as:

df %>%
  group_by(Well) %>%
  filter(!(n_distinct(Channel) == 2 & Channel == 2) &
           !(n_distinct(Channel) == 3 & Channel == 3)) %>%
  ungroup

Or in base R, you could do something like this:

agg <- aggregate(data = df, cbind(n = Channel) ~ Well, function(x) n = length(unique(x)))

result <- subset(merge(df, agg, by = "Well", all = TRUE), !(n == 2 & Channel == 2) &
         !(n == 3 & Channel == 3), select = -n)

Data

df <- structure(list(X = c(123L, 123L, 123L, 123L, 123L, 123L, 123L, 
123L, 123L, 123L), Y = c(123L, 123L, 123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L), Channel = c(1L, 2L, 1L, 2L, 1L, 2L, 
3L, 1L, 2L, 3L), Well = c("B3", "B3", "B4", "B4", "B5", "B5", 
"B5", "B6", "B6", "B6")), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response：

If the rule is to remove the N-th channel in a Well that has N channels, then the general method is

library(dplyr)

df %>%
  group_by(Well) %>%
  filter(Channel != n_distinct(Channel)) %>%
  ungroup()

# # A tibble: 6 × 4
#       X     Y Channel Well
#   <int> <int>   <int> <chr>
# 1   123   123       1 B3
# 2   123   123       1 B4
# 3   123   123       1 B5
# 4   123   123       2 B5
# 5   123   123       1 B6   
# 6   123   123       2 B6

X	Y	Channel	Well
123	123	1	B3
123	123	2	B3
123	123	1	B4
123	123	2	B4
123	123	1	B5
123	123	2	B5
123	123	3	B5
123	123	1	B6
123	123	2	B6
123	123	3	B6

X	Y	Channel	Well
123	123	1	B3
123	123	2	B3
123	123	1	B4
123	123	2	B4
123	123	1	B5
123	123	2	B5
123	123	3	B5
123	123	1	B6
123	123	2	B6
123	123	3	B6

X	Y	Channel	Well
123	123	1	B3
123	123	2	B3
123	123	1	B4
123	123	2	B4
123	123	1	B5
123	123	2	B5
123	123	3	B5
123	123	1	B6
123	123	2	B6
123	123	3	B6