for a university project I had to build an environment which includes XSS defencing machanisms. One such mechanism is the input filter. This is written in php and removes/strips characters that lead to the execution of an xss attack. The code for this looks like this:
static function inputFilter($data) {
$data = trim($data);
$data = stripslashes($data);
return htmlspecialchars($data);
}
Are there still ways to perform xss attacks that are not covered?
I tried bypassing the filter with unicode-encoding, javascript-encoding and null byte injection.
CodePudding user response:
For the most part, htmlspecialchars
will cover most scenarios, as users will not be able to use HTML exploits by injecting tags such as Hello<script>alert('XSS attack!!');</script>
However, it depends on where you are using this input, if it is populating the href
attribute of an <a>
tag, then pseudo attacks can still occur, for example:
<a href="javascript:console.log(localStorage....)">Click here</a>
Typically, any attempt at solving XSS attacks yourself should be avoided and instead you should consider using community-driven packages built for this kind of thing, one example of that is voku/antixss as they will find these pseudo-protocol calls and santize/strip them for you.
CodePudding user response:
It's helpful to think of XSS-resistance code as having two phases.
- Validating and sanitizing, what you call input filtering, and
- Escaping. For example converting
<script>
to≤script≥
so your user's browser won't see your user-furnished data as code.
Validating is the process of checking whether input from your user is correct. For example, if you gather a number it should not contain any letters. If it does your code should fail validation, reject the input, and ask the user to try again.
Sanitizing is the process of stripping away undesired parts of your user's input before using or storing it. For example, most web apps that accept html input from untrusted users sanitize that input by removing all except a subset of HTML tags. For example, <h1>
and <p>
are not removed, but <script>
and <iframe>
are.
Validating and sanitizing techniques you choose field-by-field. If you're gathering a date of birth, for example, your validating task is to make sure the date is valid. If you're gathering an email address, you ensure it is formatted correctly. When validation fails you reject the input.
PHP offers the data filtering subsystem, with built-in features for validating and sanitizing various common data types.
If your input data type is HTML your validation and sanitization rules will be complex. You can sanitize with htmlspecialchars(), but doing that forces your input to be plain text, not HTML: It escapes everything. That's a good solution for simple systems.
If you are handling actual HTML, the first rule of security code applies: don't write your own security code. Instead, use proven library code. HTML is just too complex to sanitize safely with unproven code. Cybercriminals are smarter and better-motivated than you and me, and they only need to find one flaw to pwn our web apps.
The WordPress project has been maintaining a module called kses. It's an acronym for "kses strips evil scripts". It removes dangerous tags and attributes. There's also Lars Moelleken's anti-xss package.
Again, you validate and sanitize input before you store it for later use.
Then, on output you will escape your stored data.