Home > Blockchain >  How Javascript breaks up a piece of text into individual words? and add span tags to each word?
How Javascript breaks up a piece of text into individual words? and add span tags to each word?

Time:07-05

I wrote a piece of JavaScript code and want to implement two functions:

1: Break a piece of text into separate word arrays. So far, I have used regex to search for spaces and punctuation. It does part of the functionality, but can't do anything about the whitespace code  .

2: Wrap each word in the HTML with a span tag. (I don't know how should I implement this)

this is the code:

<!DOCTYPE html>
<html>
<head>
    <script>
    window.onload = function() {
        var text = document.getElementById('text').textContent
            // Regex cannot search for `&nbsp;`
        var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
        console.log(text)
        console.log(word_array)
    }
    </script>
</head>

<body> 
    the other text
    <div id="text">
        this is
        text,break&nbsp;up&nbsp;&nbsp;the;words!
        istest testis,
        text <a href="#">text build</a> html tag! 
    </div>
    the other text
</body>

</html>

However, my code does not separate the three words. For example, break&nbsp;up&nbsp;&nbsp;the, should become to [break,up,the].

Also, I didn't wrap all the words in the div with span tags, like this:

<div id="text">
<span id='word_1'>this</span> <span id='word_2'>is</span>
...
<span id='word_3'>text</span> <a href="#"><span id='word_4'>text</span> <span id='word_5'>build</span></a> <span id='word_6'>html</span> <span id='word_7'>tag</span>!
</div>

CodePudding user response:

\s will do the job. You can change:

var word_array = text.split(/[ \t\n\r.?,"';:!()[\]{}<>\/]/)
                              ^^

to:

var word_array = text.split(/[\s\t\n\r.?,"';:!()[\]{}<>\/]/)
                              ^^

By the way, \s is a shorthand for [ \t\r\n\f]. So you can simplify your expression to:

var word_array = text.split(/[\s.?,"';:!()[\]{}<>\/]/)

Then you may need to remove empty elements from array:

//remove '' from word_array
var word_array2 = word_array.filter(e => e != '')

For the question 2, following code will wrap the text words with span tag: Edited based on the comment of @dong

function add_span(word_array, element_) {
    for (let i = 0; i < word_array.length; i  ) {
        var reg = new RegExp("([\s.?,\"';:!(){}<>])("   word_array[i]   ")([\s.?,\"';:!])", 'g');
        element_ = element_.replace(reg, '$1<span>$2</span>$3');
    }
    return element_
}
  • Related