I wrote a piece of JavaScript code and want to implement two functions:
1: Break a piece of text into separate word arrays.
So far, I have used regex
to search for spaces and punctuation. It does part of the functionality, but can't do anything about the whitespace code
.
2: Wrap each word in the HTML with a span
tag. (I don't know how should I implement this)
this is the code:
<!DOCTYPE html>
<html>
<head>
<script>
window.onload = function() {
var text = document.getElementById('text').textContent
// Regex cannot search for ` `
var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
console.log(text)
console.log(word_array)
}
</script>
</head>
<body>
the other text
<div id="text">
this is
text,break up the;words!
istest testis,
text <a href="#">text build</a> html tag!
</div>
the other text
</body>
</html>
However, my code does not separate the three words. For example, break up the
, should become to [break,up,the]
.
Also, I didn't wrap all the words in the div with span
tags, like this:
<div id="text">
<span id='word_1'>this</span> <span id='word_2'>is</span>
...
<span id='word_3'>text</span> <a href="#"><span id='word_4'>text</span> <span id='word_5'>build</span></a> <span id='word_6'>html</span> <span id='word_7'>tag</span>!
</div>
CodePudding user response:
\s
will do the job. You can change:
var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
^^
to:
var word_array = text.split(/[\s\t\n\r.?,"';:!()[\]{}<>\/]/)
^^
By the way, \s
is a shorthand for [ \t\r\n\f]
. So you can simplify your expression to:
var word_array = text.split(/[\s.?,"';:!()[\]{}<>\/]/)
Then you may need to remove empty elements from array:
//remove '' from word_array
var word_array2 = word_array.filter(e => e != '')
For the question 2, following code will wrap the text words with span
tag:
Edited based on the comment of @dong
function add_span(word_array, element_) {
for (let i = 0; i < word_array.length; i ) {
var reg = new RegExp("([\s.?,\"';:!(){}<>])(" word_array[i] ")([\s.?,\"';:!])", 'g');
element_ = element_.replace(reg, '$1<span>$2</span>$3');
}
return element_
}