I am working my way through the Rosalind problems and have become stuck on the problem Ordering Strings of Varying Length Lexicographically.
So far I have succesfully managed to create all the right permutations of letters of varying lengths. The main issue is now how to sort them based on the order the letters were supplied.
The example input are the letters D N A
. But can be at most 12 unique letters in n <= 4 permutations.
For the example n = 3.
This yields 39 different permutations with replications, but these are then to be sorted lexicographically in the order of D
before N
before A
.
Meaning that the correct order is:
Correct | Incorrect |
---|---|
D | A |
DD | AA |
DDD | AAA |
DDN | AAD |
DDA | AAN |
DN | AD |
DND | ADA |
DNN | ADD |
DNA | ADN |
DA | AN |
DAD | ANA |
DAN | AND |
DAA | ANN |
... | ... |
AAD | NNA |
AAN | NND |
AAA | NNN |
My first thought was to convert them to factors with levels and then attempt to sort them based on their levels but I cannot quite make it work.
so far I create the list of all permutations and then attempt to sort it, but don't know how to get the sorting to follow the given order of D N A
text_input <- c("D", "N", "A")
n <- 3
empty_df <- data.frame(matrix("", ncol = n))
temp_df <- data.frame()
for (i in n:1) {
temp_df <- data.frame(arrangements::permutations(text_input, k = i, replace = TRUE))
empty_df <- bind_rows(empty_df, temp_df)
}
result_df <- replace(empty_df, is.na(empty_df), "") |>
unite(col = combined, everything(), sep = "", remove = FALSE) |>
mutate(across(2:(n 2), ~ factor(.x, levels = text_input)),
across(2:(n 2), ~ str_replace_na(.x, replacement = "")))
result_vec <- tail(result_df$combined, -1)
CodePudding user response:
I'll use the sample data you have, Correct
, randomized to ensure we get the order right.
quux <- structure(list(Correct = c("D", "DD", "DDD", "DDN", "DDA", "DN", "DND", "DNN", "DNA", "DA", "DAD", "DAN", "DAA", "AAD", "AAN", "AAA"), Incorrect = c("A", "AA", "AAA", "AAD", "AAN", "AD", "ADA", "ADD", "ADN", "AN", "ANA", "AND", "ANN", "NNA", "NND", "NNN")), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L), class = "data.frame")
set.seed(42)
quuxR <- quux[sample(nrow(quux)),]
quuxR$Correct
# [1] "D" "DDA" "AAA" "DNA" "DA" "DDN" "DD" "AAD" "DNN" "DND" "DAD" "DAA" "AAN" "DAN" "DDD" "DN"
ltrs <- c("D", "N", "A")
inds <- lapply(strsplit(quux$Correct, ""), match, table = ltrs)
inds <- lapply(inds, `length<-`, max(lengths(inds)))
quuxR$Correct[do.call(order, c(do.call(Map, c(f=c, inds)), list(na.last = FALSE)))]
# [1] "D" "DD" "DDD" "DDN" "DDA" "DN" "DND" "DNN" "DNA" "DA" "DAD" "DAN" "DAA" "AAD" "AAN" "AAA"
identical(quux$Correct, quuxR$Correct[do.call(order, c(do.call(Map, c(f=c, inds)), list(na.last = FALSE)))])
# [1] TRUE
Steps:
strsplit(.., "")
splits a string into individual letters.match(.., table=ltrs)
replaces a letter with the index inltrs
, which gives us the proper priority/sort of the letters.`length<-`
is because we are going to subsequentlyrbind
them, but that will only work if they all have the same length. Doing this step (along withmax(lengths(..))
fills the shorter vectors withNA
s so that all sub-vectors have the same number of elements.do.call(Map, c(f=c, inds))
transposes a list withm
vectors lengthn
into an
-length list, each withm
elements. This is very helpful to have for the next step ...do.call(order, ..)
is similar to callingorder(L[[1]], L[[2]], L[[3]])
(ifL
is the transposed list), but more programmatic. We addlist(na.last=FALSE)
as an argument to the list of vectors so that we can get the right ordering of shorter strings.
CodePudding user response:
You can use chartr
and order
its result and use this to subset the original unordered vector.
x <- c("DD", "AAN", "DDD", "DNA", "DAA", "DAD", "DDN", "DDA", "DND",
"DAN", "DNN", "DA", "AAA", "AAD", "DN", "D") #Unordered
y <- c("D", "DD", "DDD", "DDN", "DDA", "DN", "DND", "DNN", "DNA", "DA",
"DAD", "DAN", "DAA", "AAD", "AAN", "AAA") #Target order
z <- x[order(chartr("DNA", "ABC", x))]
identical(z, y)
#[1] TRUE
z
# [1] "D" "DD" "DDD" "DDN" "DDA" "DN" "DND" "DNN" "DNA" "DA" "DAD" "DAN"
#[13] "DAA" "AAD" "AAN" "AAA"