Home > Back-end >  How can I write R function to reverse complementary DNA?
How can I write R function to reverse complementary DNA?

Time:11-17

ReverseFunc <- function(base) {
  if(base == 'A' | base ==  'a') print("T") 
  else(base == 'T' | base == 't') print("A")
  else(base == 'G' | base == 'g') print("C")
  else(base == 'C' | base == 'c') print("G")
}

base <- 'ATCG' #This is your test data
comseq <- ReverseFunc(base)
print(comseq)

It does not work. And I have to write the function by do not use a function from an R package.

CodePudding user response:

There's a little known but powerful base R function to do this: chartr

chartr(old="ATGC", new="TACG", x )

base <- 'ATCG'
chartr("ATGC", "TACG", base )
#[1] "TAGC"

To get the lower case items converted just add them as another "old"sequence with the same capitalized sequence as "new" values:

chartr("ATGCatgc", "TACGTACG", base )

Seems pretty obvious how to use that as the body of a new function and it might be homework so I'm leaving it as "an exercise for the reader." The error that prevented your version from working was two-fold. 1) using if and 2) assuming that the base string would be handled character-wise. In R the if function does not accept a sequence of arguments so even if you had broken the input character value into its component letters, it would only have processed the first letter. And now I see the final portion of you homework problem. I'd have to argue that the meaning of "not use a function from an R package." does not apply here because chartr is in the base installation. The instructor cannot possibly mean that base installation functions cannot be used since even the function function would be proscribed in that case.

CodePudding user response:

There are several issues in your function:

  1. Your function assumes that the parameter — base — is a single base (or a vector of bases) rather than a character string which is a concatenation of bases. You’ll first need to split your string, and you need to use ifelse instead of if to perform vectorised comparisons.
  2. For alternative branches you need else if instead of just else.
  3. Your function attempts to output the resulting base via print. This isn’t appropriate. Instead, you need to return the base.
  4. Lastly, this function attempts to compute a complement, but it does not reverse it, despite its name. (The name is also questionable: It should be ReverseComp or some variation thereof, and Func has no place in the name of a function: it’s redundant and uninformative.)

Put together, we’re left with this:

reverse_complement <- function (dna) {
    bases = toupper(strsplit(dna, '')[[1L]])
    compl = ifelse(bases == 'A', 'T',
        ifelse(bases == 'C', 'G',
        ifelse(bases == 'G', 'C',
        ifelse(bases == 'T', 'A', 'N'))))

    paste(rev(compl), collapse = '')
}

… however, this nested ifelse call is fairly convoluted. The chartr solution in the other answer makes this much more readable, and also more efficient. There are other possible solutions (e.g. using switch) but they’re not much better either. Short of chartr, I’d use a lookup table with match.

CodePudding user response:

Adding a separate answer (and again not offering a complete answer since it's homework). Based on Konrad's suggestion of match one could align an input to output translation thusly:

c("T","A","C","G","T","A","C","G")[ match( strsplit(base, "")[[1]], 
c("A","T","G","C","a","t","g","c")) ]
#[1] "T" "A" "G" "C"

And using toupper would make it even more compact:

c("T","A","C","G")[ match( toupper(strsplit(base, "")[[1]]), 
c("A","T","G","C")) ]
[1] "T" "A" "G" "C"

The lesson to be learned here is that match is designed to retrun a numeric index that is typically then used as an argument to the extraction function [ to pull out items from a result vector in use cases like

  possible_results_to_choose[ match( input, choices) ]
  •  Tags:  
  • r
  • Related