I would like to create a new column which is a sorted version of these 2 columns separated by an underscore (using dplyr mutate method). How is this possible?
> dput(x)
structure(list(sgRNA1_Approved_Symbol = c("ADAD1", "ADAD1", "ADAD1",
"ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1", "ADAD1"
), sgRNA2_Approved_Symbol = c("AKT1", "AKT1", "AKT1", "AKT1",
"BRD4", "BRD4", "BRD4", "BRD4", "MYC", "MYC")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
This is what I tried:
x %>%
mutate(sorted_gene_pair = paste(sort(sgRNA1_Approved_Symbol, sgRNA2_Approved_Symbol), collapse = '_'))
CodePudding user response:
How about
x %>%
mutate(sorted_gene_pair = paste0(
pmin(x$sgRNA1_Approved_Symbol, x$sgRNA2_Approved_Symbol),
"_",
pmax(x$sgRNA1_Approved_Symbol, x$sgRNA2_Approved_Symbol))
)
CodePudding user response:
x$sorted_gene_pair <- apply(t(apply(x[, c('sgRNA1_Approved_Symbol', 'sgRNA2_Approved_Symbol')], 1, sort)), 1, paste, collapse = '_')
This also works but is quite messy, so was wondering if there is a dplyr method.
CodePudding user response:
You would want to sort the entire data-frame first, and then just paste the results together.
x %>%
arrange(sgRNA1_Approved_Symbol, sgRNA2_Approved_Symbol) %>%
mutate(
sorted_gene_pair = str_c(sgRNA1_Approved_Symbol, "_", sgRNA2_Approved_Symbol)
)