How to remove HTML tags in JavaScript, but holding the character "<" when ther's n-CodePudding

There's a way to parse the HTML in Javascript holding the character < when the tag's not closed without replacing HTML chars?

Talking about a string like <html>efrferrefrer<wedw.

It have to gives back efrferrefrer<wedw.

Trying with

    function removeHtmlTags(input){
        let tmp = document.createElement("div");
        tmp.innerHTML = input;
        return tmp.textContent || tmp.innerText || "";
    }
    //or
    function removeHtmlTags(input){
        return input.replace(/<[^>]*>?/gm, '');
    }

does not gives the desired result.

It eliminates "<wedw".

So, there's a way that do this without using functions that replacing html characters like

    function escapeHtml(text) {
        var map = {
            '&': '&amp;',
            '<': '&lt;',
            '>': '&gt;',
            '"': '&quot;',
            "'": '&#039;'
        };
        return = text.replace(/[&<>"']/g, function(m) { return map[m]; });
    }

It have to be exactly efrferrefrer<wedw.

CodePudding user response：

TNX @Apostolos for your answer, but like @T.J. Crowder wrote You can't use a simple regex to parse HTML reliably.

Anyway tnx to you, after some trying the problem was the browser interpretation for the code, because by logging with console.log the result it's different from the browser visual results (also from what it's putted like innerHTML in tags after the "parsing" with regex).

Try it here (open-it in full page mode)

<!doctype html>
<html lang="en">
<head>
    <script src="https://code.jquery.com/jquery-3.6.1.js" integrity="sha256-3zlB5s2uwoUzrXK3BT7AX3FyvojsraNFxCc2vC/7pNI=" crossorigin="anonymous"></script>
</head>
<body>
    <h4>Tag remover</h4>
    <div>
            <label for="a">HTML</label><br>
            <textarea id="a" name="a" rows="5" style="width:100%;"></textarea>
    </div>
    <div id="output"></div> 
    <div id="output2"></div>    
    
<script>
    function removeHtmlTagsParsed(input){
        let tmp = document.createElement("div");
        tmp.innerHTML = input;
        output = tmp.textContent || tmp.innerText || "";
        console.log('parsed: ',output);
        return output;
    }
        
    function removeHtmlTagsRegex(input){    
        output = input.replace(/<[A-Z] >{1}/igm, '');
        console.log('regex: ',output);
        return output;
    }

    $(document).on('change keyup keydown paste','#a', function(e){
        $("#output").html( '<h4>Result parsed</h4>'   removeHtmlTagsParsed($(this).val()) );
        $("#output2").html( '<h4>Result parsed with regex</h4>'   removeHtmlTagsRegex($(this).val()) );
    });
</script>
    
</body>
</html>

CodePudding user response：

I'm not sure if I understood correctly but if the target is just replacing <xxxx> and </xxxx> tags then try this

    const p = '<html>efrferrefrer<wedw';
     // const regex = /<\/{0,1}[A-Z] >{1}/igm;
    const regex = /<\/{0,1}[0-9:%;A-Z\s="] >{1}/igm;
    console.log(p.replace(regex, ''));

    const p2 = '<html>efrferrefrer<wedw</html><html>';
    console.log(p2.replace(regex, ''));

    const p3 = `<head>test header
</head>
<body>
    <h4>Tag remover</h4></unclose
    <div>
            <label for="a">HTML</label><br>
            <textarea id="a" name="a" rows="5" style="width:100%;"></textarea>
    </div>
    <div id="output"></div> 
    <div id="output2"></div>`;
console.log(p3.replace(regex, ''));