Home > Back-end >  Subset based on conditions on multiple columns
Subset based on conditions on multiple columns

Time:07-28

I have a combination table based on a "colour list":

library(tidyr)

#give available colour combo
colour= c("BK", "CS", "DB","DG","LB","LG","MV","OR","RD","WT","YL","ID","EMPTY")

combo = expand.grid(colour, colour,colour, colour)
#function "colnames" to change column names
colnames(combo) <- c('A','B','C','D')

However I'm not interested in all possible combinations. I want to be able to filter this table by removing the combination that I don't want. For example removing rows that have more than 1 "ID" and rows that have more than 1 "EMPTY". (removing rows like: "ID ID OR LB" or "EMPTY ID LB EMPTY".

I also want to remove rows that have no "ID" at all or no "EMPTY" at all.

CodePudding user response:

Using your code, you can do the following:

# Indices of rows with more than 1 "EMPTY"
  rem1 <- which(apply(combo,1,function(x) sum(x=="EMPTY"))>=2)
# Indices of rows with more than one "ID"
  rem2 <- which(apply(combo,1,function(x) sum(x=="ID"))>=2)

# Subtract rows in "rem1" and "rem2" to get combo2
  combo2 <- combo[-c(rem1, rem2),]

Elements rem1 and rem2 can be combined into a unique object; nevertheless, I separated them so that you can see which indices fulfil each condition.

CodePudding user response:

A base solution:

combo[!(rowSums(combo == 'ID') > 1 | rowSums(combo == 'EMPTY') > 1), ]

With dplyr, you could also use rowSums across():

library(dplyr)

combo %>%
  filter( !(rowSums(across(A:D) == 'ID') > 1 | rowSums(across(A:D) == 'EMPTY') > 1) )
  • Related