how to check if html or script tags exist in input in javascript using RegEx-CodePudding

we want to prevent user to enter scrips or html tags input to avoid cross site script attack

for this i am writing this code but its seems not working

var preventScriptsRegEx = new RegExp("[^<>]*");
    
    function getValue()  {
        return document.getElementById("myinput").value;
    }
    
    function test() {
        alert(preventScriptsRegEx.test(getValue()));
    }

this is inspired from this post : Prevent html tags entries in mvc textbox using regular expression

CodePudding user response：

You can try creating a temporary element, set the input's value to the element's innerHTML property, and check the element's childElementCount:

function checkForHTML(text){
  var elem = document.createElement('div')
  elem.innerHTML = text;
  return !!elem.childElementCount;
}

button.addEventListener('click', function(){
  console.log(checkForHTML(input.value))
})

<input id="input">
<button id="button">Check</button>

CodePudding user response：

Please don't do this. You can't just use some nifty RegExp to check for script injection. There are plenty of attack vectors where you can trick injections where RegExp simply cannot match well. This involves for example, using \u0001 UTF8 encodings or HTML entity encoding (< becomes & lt;, or & # 60; or & # x003C;) (lol in original post it even worked here...) which will pass your validation but automatically transformed so that execution is possible. I've been writing such exploits for fun, so I can guarantee you that there are almost as many ways to exploit such algorithms as there is creativity in a hackers/crackers mind.

The right way to protect yourself from such script injections/XSS is, to not trust user generated content in the first place. Do not trust "validation logic" as well. You shouldn't just accept HTML, JS or CSS code when it is somehow generated on the client side. Never. You should never save such content in a database, or transfer it by any other means and render it again. User generated content that is or could be in form of CSS, HTML or JS is evil and should be treated like a ticking nuclear bomb.

Every content that the client is sending to the server and that is re-rendered on client side in some way must not be sanitized but explicitly rendered via (htmlElement).innerText = user content (pseudo code); innerText is guaranteed to not create DOM nodes than TextNodes which is the only way to be sure that you're safe Never ever in-place render into HTML or CSS. Remark: I can also make CSS code XSS e.g. using vendor-specific CSS addons.

Example: behavior:url(script.htc); -moz-binding: url(script.xml#mycode);

Just never use .innerHTML = as well. Never let user generated code directly affect the DOM at all, never do < div > render($content) </ div > or anything like that.

For content that should be styled, use a DSL. It could be a JSON or any other DSL like Markdown etc. if you need a simple one, that splits text content from context information. Then, by code you trust, loop thru that data structure and render the HTML / DOM elements and always use .innerText or guaranteed .innerText use to render the user generated content (React for example is guaranteed to use that API except you're explicitly using innerHTML or dangerouslySetInnerHTML which is just sabotage). Also don't allow user generated content to set HTML element attributes. I can XSS that too.

Example: < a href="javascript:alert('XSS!')" / >