create a column which is a collapsed sorted pair of two existing columns?-CodePudding

I would like to create a new column which is a sorted version of these 2 columns separated by an underscore (using dplyr mutate method). How is this possible?

> dput(x)
structure(list(sgRNA1_Approved_Symbol = c("ADAD1", "ADAD1", "ADAD1", 
"ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1"
), sgRNA2_Approved_Symbol = c("AKT1", "AKT1", "AKT1", "AKT1", 
"BRD4", "BRD4", "BRD4", "BRD4", "MYC", "MYC")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

This is what I tried:

x %>% 
  mutate(sorted_gene_pair = paste(sort(sgRNA1_Approved_Symbol, sgRNA2_Approved_Symbol), collapse = '_'))

CodePudding user response：

How about

x %>% 
   mutate(sorted_gene_pair = paste0(
     pmin(x$sgRNA1_Approved_Symbol, x$sgRNA2_Approved_Symbol),
     "_",
     pmax(x$sgRNA1_Approved_Symbol, x$sgRNA2_Approved_Symbol))
   )

CodePudding user response：

x$sorted_gene_pair <- apply(t(apply(x[, c('sgRNA1_Approved_Symbol', 'sgRNA2_Approved_Symbol')], 1, sort)), 1, paste, collapse = '_')

This also works but is quite messy, so was wondering if there is a dplyr method.

CodePudding user response：

You would want to sort the entire data-frame first, and then just paste the results together.

x %>%
  arrange(sgRNA1_Approved_Symbol, sgRNA2_Approved_Symbol) %>% 
  mutate(
    sorted_gene_pair = str_c(sgRNA1_Approved_Symbol, "_", sgRNA2_Approved_Symbol)
  )