Home > Enterprise >  Validate DNA FASTA sequence using Javascript and RegEx
Validate DNA FASTA sequence using Javascript and RegEx

Time:10-06

I have a FASTA DNA Sequence. I want the text area to validate the fasta text for the below-given pattern. If the text does not follow the pattern then an error should be shown.

The pattern is that: Each sequence will have a heading starting with the ">" symbol followed by alphanumeric and special characters. Then in a new line, the text should only contain the letters from ["A","G","T","C"].

>sfr 354t:5
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTTG
TTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC


>NC_000024.10:c2787682-2786855 Homo sapiens chromosome Y, GRCh38.p14 Primary Assembly
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTT
GTTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGC
TGTGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAA
GTATCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGC
ATTCATCGTGTGGTCTCGCGATCAGAGGCGCAAGATGGCTCTAGAGAATCCCAGAATGCGAAACTCAGA
GATCAGCAAGCAGCTGGGATACCAGTGGAAAATGCTTACTGAAGCCGAAAAATGGCCATTCTTCCAGGA
GGCACAGAAATTACAGGCCATGCACAGAGAGAAATACCCGAATTATAAGTATCGACCTCGTCGGAAGGC

This is what I tried:

("#fasta_text").on('change keyup paste', function(e) {
                            var seq = $(this).val();
                            if (!seq.match(/> .*[a-z] \n[AGCT]/igm))
                                e.preventDefault();
});

CodePudding user response:

We can try using the following regex pattern:

\s*>\S (?: \S )*\s [ACGT] (?:\s [ACGT] )*

Sample script:

var input = `    >sfr 354tfv
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTTG
TTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC

>vkgi234 n.39
TAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC`;

if (input.match(/\s*>\S (?: \S )*\s [ACGT] (?:\s [ACGT] )*/)) {
    console.log("MATCH");
}

CodePudding user response:

You might use a pattern without any flags (or /i if you want to have a case insensitive match)

Note that you have these events change keyup paste keypress in the .on which may fire a lot of times and may in this case give a "No match for: " first.

const regex = /^[^\S\n]*>[^\s>].*(?:\n[^\S\n]*[AGTC] ) $/;

Explanation

  • ^ Start of string
  • [^\S\n]* Match optional spaces
  • >[^\s>] Match > followed by a non whitespace char other than >
  • .* match the rest of the line
  • (?: Non capture group to repeat as a whole part
    • \n[^\S\n]*[AGTC] Match a newline, optional spaces and 1 times any of A G T C
  • ) Close the non capture group and repeat 1 times
  • $ End of string

See a regex demo

$(document).ready(function() {
  const regex = /^[^\S\n]*>[^\s>].*(?:\n[^\S\n]*[AGTC] ) $/;
  $('#fasta_text').on('change keyup paste keypress', function(e) {
    const seq = $(this).val();
    if (!seq.match(regex)) {
      console.log("No match for: "   seq);
    } else {
      console.log("Match!")
    }
  });
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<textarea id="fasta_text" rows="5" cols="80"></textarea>

  • Related