I need to process a piece of text into an array of words.
Delimiters between words are newlines, spaces, and various punctuation marks, and
.
The code I wrote was able to handle other cases, but not the
case.
Notice:I need to handle all cases within the same regex and cannot replace
with spaces.
This code doesn't go wrong, it just runs in chrome and the result is not the expected value.
In the generated word array,
"break up test the words"
is a value(wrong), I need it to be 5:[break,up,test,the,words]
(right)
my code:
<!DOCTYPE html><html><head>
<script>
window.onload = function(){
var text = document.getElementById('text').textContent
// of below regex doesn't work
var word_array = text.split(/[ \t\n\r.?,"';:!(){}<>\/]| /)
console.log(text)
console.log(word_array)
}
</script>
</head><body>
<div id="text">this is text,break up test the words!ok</div>
</body></html>
CodePudding user response:
The issue is that the regex sees   as those exact characters. You want to use '\xa0' instead.