I have a FASTA DNA Sequence. I want the text area to validate the fasta text for the below-given pattern. If the text does not follow the pattern then an error should be shown.
The pattern is that: Each sequence will have a heading starting with the ">" symbol followed by alphanumeric and special characters. Then in a new line, the text should only contain the letters from ["A","G","T","C"].
>sfr 354t:5
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTTG
TTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC
>NC_000024.10:c2787682-2786855 Homo sapiens chromosome Y, GRCh38.p14 Primary Assembly
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTT
GTTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGC
TGTGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAA
GTATCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGC
ATTCATCGTGTGGTCTCGCGATCAGAGGCGCAAGATGGCTCTAGAGAATCCCAGAATGCGAAACTCAGA
GATCAGCAAGCAGCTGGGATACCAGTGGAAAATGCTTACTGAAGCCGAAAAATGGCCATTCTTCCAGGA
GGCACAGAAATTACAGGCCATGCACAGAGAGAAATACCCGAATTATAAGTATCGACCTCGTCGGAAGGC
This is what I tried:
("#fasta_text").on('change keyup paste', function(e) {
var seq = $(this).val();
if (!seq.match(/> .*[a-z] \n[AGCT]/igm))
e.preventDefault();
});
CodePudding user response:
We can try using the following regex pattern:
\s*>\S (?: \S )*\s [ACGT] (?:\s [ACGT] )*
Sample script:
var input = ` >sfr 354tfv
AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTTG
TTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC
>vkgi234 n.39
TAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC`;
if (input.match(/\s*>\S (?: \S )*\s [ACGT] (?:\s [ACGT] )*/)) {
console.log("MATCH");
}
CodePudding user response:
You might use a pattern without any flags (or /i
if you want to have a case insensitive match)
Note that you have these events change keyup paste keypress
in the .on
which may fire a lot of times and may in this case give a "No match for: " first.
const regex = /^[^\S\n]*>[^\s>].*(?:\n[^\S\n]*[AGTC] ) $/;
Explanation
^
Start of string[^\S\n]*
Match optional spaces>[^\s>]
Match>
followed by a non whitespace char other than>
.*
match the rest of the line(?:
Non capture group to repeat as a whole part\n[^\S\n]*[AGTC]
Match a newline, optional spaces and 1 times any ofA
G
T
C
)
Close the non capture group and repeat 1 times$
End of string
See a regex demo
$(document).ready(function() {
const regex = /^[^\S\n]*>[^\s>].*(?:\n[^\S\n]*[AGTC] ) $/;
$('#fasta_text').on('change keyup paste keypress', function(e) {
const seq = $(this).val();
if (!seq.match(regex)) {
console.log("No match for: " seq);
} else {
console.log("Match!")
}
});
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<textarea id="fasta_text" rows="5" cols="80"></textarea>