Home > Software design >  How to filter out duplicates in 2-column data frame but in reverse orientation?
How to filter out duplicates in 2-column data frame but in reverse orientation?

Time:09-14

I have a data frame of pairs of genes. There are some pairs which are listed twice but in reverse orientation. How do I remove those pairs which are duplicates (but in reverse orientation)? Thanks!

> dput(all_pairs)
structure(list(gene1 = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 
2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 
9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 10L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("ASXL1", "BICRA", 
"CCDC168", "HRAS", "MUC16", "NOTCH1", "OBSCN", "PLEC", "RREB1", 
"TTN"), class = "factor"), gene2 = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L), .Label = c("ASXL1", "BICRA", 
"CCDC168", "HRAS", "MUC16", "NOTCH1", "OBSCN", "PLEC", "RREB1", 
"TTN"), class = "factor")), out.attrs = list(dim = c(10L, 10L
), dimnames = list(Var1 = c("Var1=ASXL1", "Var1=BICRA", "Var1=CCDC168", 
"Var1=HRAS", "Var1=MUC16", "Var1=NOTCH1", "Var1=OBSCN", "Var1=PLEC", 
"Var1=RREB1", "Var1=TTN"), Var2 = c("Var2=ASXL1", "Var2=BICRA", 
"Var2=CCDC168", "Var2=HRAS", "Var2=MUC16", "Var2=NOTCH1", "Var2=OBSCN", 
"Var2=PLEC", "Var2=RREB1", "Var2=TTN"))), class = "data.frame", row.names = c(NA, 
-90L))

CodePudding user response:

This keeps only one copy of each pair, no matter what the orientation/order is:

all_pairs[!duplicated(t(apply(all_pairs, 1, sort))), ]
  • Related