Home > Software engineering >  HTML/JavaScript quoting bug in all major browsers?
HTML/JavaScript quoting bug in all major browsers?

Time:05-03

Assume, you want to shown an alert with the string content of <!-- Comment --> <script type="text/javascript"></script> in JavaScript on a HTML page. You can do that with the following code:

<!DOCTYPE html>
<html>
<head>
<title>Quoting</title>
</head>
<body>
<script type="text/javascript">
   alert('<!-- Comment --> <script type="text/javascript"></script\u003E');
</script>
</body>
</html>

Note here the quoted > character in the </script> part of the text. This uses a JavaScript Unicode escape to prevent the HTML parser from interpreting this part of the string literal as the end of the script tag. The code above works perfectly in FF, Chrome, IE.

Now try to apply the > quoting also to the end of the comment within the string literal. This should change nothing, because the XML comment syntax should not be interpreted within script tags in HTML (and obviously is not interpreted, because the comment syntax was shown in the alert):

<!DOCTYPE html>
<html>
<head>
<title>Quoting</title>
</head>
<body>
<script type="text/javascript">
   alert('<!-- Comment --\u003E <script type="text/javascript"></script\u003E');
</script>
</body>
</html>

Interestingly, this code breaks - the alert is not printed when the page loads. The problem can be reproduced at least in FF, Chrome and IE. Am I missing something in the specs, or is that a "browser-independent" bug in the HTML parser of all major browsers?

The DOM inspector shows the following:

enter image description here

It looks like the rest of the document is interpreted as part of the script in this case.

Any ideas?

CodePudding user response:

This is not a bug.

The first < puts the parser into Script data less-than sign state then the ! puts it in script data escape start state and so on.

Replacing the > that ended the HTML comment with an escape sequence means it doesn't come out of the "dealing with a comment" state until it hits the end of the HTML document.


It is a historical artefact of the hack early HTML used to allow inline scripts without the JS source code showing up on the page for browsers which didn't support the <script> element.

  • Related