Defense against XSS in Zend Framework
Školení, která pořádám
Zend Framework doesn't have its own templating system thus if you want to print some data then you need to escape them manually: echo $this->escape($this->userInput)
.
Manual escaping is risky because it is very easy to forget about it, for example in this code: echo "Page #$this->num"
. If you don't know for sure that $this->num
will be a number then this code is vulnerable to XSS.
Another problem is that the default escaping function escapes only some characters with special meaning in HTML. So this code is vulnerable to XSS even if it manually escapes user data: <span title='<?php echo $this->escape($this->userInput); ?>'>Test</span>
. The problematic character is '
which is valid for delimiting HTML attribute values but is not escaped by the default implementation of $this->escape()
. So if a malicious user passes ' onmouseover='alert(/XSS/)
then he just attacked our page.
Zend Framework 2
Authors of Zend Framework were aware about this deficiency so they've provided a semi-automatic escaping in Zend Framework 2 (currently in beta). The biggest problem of their implementation is that they escape only some data – to quote the roadmap:
Have all variables retrieved via __get()
be escaped by default, and instead require developers to call a special method when they want the raw value. This will not help in all situations – return values from view variable method calls or properties, or values from arrays would not be escaped in this fashion.
So some data will be automatically escaped by default but some data won't. It will cause even more confusion to template authors resulting in more forgotten escaping. Not to mention incompatibility with templates from Zend Framework 1 causing double-escaping if used with Zend Framework 2.
Another problem is that the current implementation is a perfect example of premature escaping. What do you think that the following code will print?
<?php
$view->eq = '2';
$view->eq .= ' < ';
$view->eq .= '5';
echo $view->eq;
?>
The correct answer is 2 &lt; 5
which will render as 2 < 5 in browser.
Conclusion
Defense against XSS in Zend Framework have these problems:
- Manual escaping which is easy to forget.
- Not all special characters are escaped by default escaping function.
- Some data are automatically escaped, some not [ZF2].
- Suffers from double escaping in some cases [ZF2].
Solution
The solution to all these problems is to avoid usage of Zend Views and use some modern templating system instead. The best templating systems provide not only consistent automatic escaping but also context-aware escaping.
Comments
Jakub Vrána :
Thanks for filling this and I wish you good fight with experts from IRC securing against XSS by backslashes :-).
Ugo:
Yeah zend is piece of shit, Latte in Nette rulez. Ale teď vážně, takovejhle kousek textu se klidně nechal napsat v češtině, zrovna tak jedna věta v diskusi.
Tomáš Fejfar:
Tak Zend se týká poněkud většího publika, než jen české kotlinky :) Proto je anglicky napsaný článek naopak správně :P(#d-12968)
#d-12974 reply
Marco Pivetta:
Hi there!
Actually, ZF2 had auto-escaping on all variables in views. That is a behavior that in my opinion has to be stripped. You should follow Matthew's work on his zend-view refactoring and eventually PR to his branch if you need these changes to be included (check https://github.com/weierophinney/zf2/commi…/view-layer ).
Also, your concat example doesn't really work as you're using __get and __set implicitly (also bad practice imho).
I must also remember that PHP functionalities aren't really suited for escaping... You should use more hardcore tools like HTMLPurifier when handling such cases... Otherwise you're using the wrong tool for the job :)
Tomáš Fejfar:
Not the point. The point is there IS a tool out there, that handles this. And ZF2 does NOT handle this. => ZF2 lack some very crucial functionality, that is actually possible to implement :)
Ondřej Machulda:
Maybe chance to re-implement the nette context sensitive escaping to ZF2 :-)?
Tomáš Fejfar:
That's the plan. When view layer is little bit stable, we'll try to use Latte as View Module - it should be fairly simple to do, unless Latte has some hidden dependencies (like it would like to send the response itself, or sth like that :))
Ondřej Machulda:
I rather thought about re-implementation, maybe some light-weight, which could become part of ZF upstream. I really doubt any part of Nette will do so.
E.g. context sensitive escape(), able to switch between ENT_COMPAT and escaping single quotes wouldn't be IMO hard to implement.
Jakub Vrána :
Is there any reason why don't use ENT_QUOTES everytime? The only disadvantage is that the output will be 5 bytes longer for each '.
Ondřej Machulda:
You definitely have a point, and thanks for the English post. And especially the premature escaping seems like an evil.
Though, ad issue #2 - it just copies default behavior of php's htmlspecialchars(). The hardcoded 'ENT_COMPAT' in ZF's escape() doesn't make it really adaptable, but the only thing that can by said is "ZF escape() is just as dumb as htmlspecialchars()". But it is not less nor more XSS vulnerable. The only way out of there would be the nette-like context sensitive escaping, as you wrote.
Ad #3 and #4 - ZF2 view layer is AFAIK in progress at the moment, so let's hope, create issues, talk on mailing list etc., so it won't be screwed up and fits programmers needs :-).
Jakub Vrána :
I see no reason why the default implementation couldn't be htmlspecialchars() with ENT_QUOTES. Current implementation isn't 1:1 htmlspecialchars() anyway because it uses custom encoding.
BTW the ZF1 implementation of the escape() function is awful hack: http://framework.zend.com/code/filedetails.…%2FAbstract.php - search for function escape($var):
1. There is no way how to pass 'htmlspecialchars' or 'htmlentities' with real default parameters.
2. If you set 'htmlEntities' then the encoding is ignored.
So basically the behavior is driven by letter-case of function name...
Pádraic Brady:
Hi Jakub,
I've picked up the issue you note above and we're discussing it internally at ZF but I'd like to clarify a few issues. First of all, the example vulnerability you offer ignores the fact that escaping output is an application's final line of defence. It ignores the first line of defence which is to correctly filter and sanitise user input. Any HTML filter or sanitiser would have crushed your example long before it reached htmlspecialchars(). So your vulnerability arises from the missing input validation and NOT from the Zend Framework escaping.
If you want a more relevant example of this, explain how a string like javascript:alert(document.cookie); would be blocked by escaping? Answer: It wouldn't be. Why? Because the presence of escaping is utterly pointless and futile without the filtering of user input. You CANNOT have one with the other. They are complementary tasks expressed by the common security rule: "Filter Input; Escape Output".
In the future, and prior to publicising a security vulnerability that has not been notified to the relevant developers who have a public security policy, a dedicated security email address, and more than one security expert keeping an eyes on things, you might want to give us the opportunity to discuss your issue out of the limelight.
Marek Dušek:
Hello Padraic,
in defence to Jakub's original example, the point is (among others) the current ZF implementation does not escape quotes properly, allowing to inject an event handler (or anything really) through the element's attribute (which I don't think the current ZF users are aware off).
Your example, on the other hand, does not need any escaping when entered into "title" as it is harmless. It is not about whether or not to escape javascript in general, but about possible "overflowing" the boundaries of html attributes by not escaping the quotes.
Just my 2 cents.
Marek
Padraic Brady:
>in defence to Jakub's original example, the point is (among others) the
>current ZF implementation does not escape quotes properly, allowing to
>inject an event handler (or anything really) through the element's attribute
>(which I don't think the current ZF users are aware off).
And, with respect to Jakub, he uses single quoted attributes which is bad and doesn't bother to filter the input which is bad. That's bad multiplied by 2.
>Your example, on the other hand, does not need any escaping when entered
>into "title" as it is harmless. It is not about whether or not to escape javascript
>in general, but about possible "overflowing" the boundaries of html attributes
>by not escaping the quotes.
Are you willing to bet that title injections of plain text are harmless? I assure you, you're wrong. The choice of how to write HTML is a known factor in securing your system. For example, any validating XHTML markup tends to be more secure than non-validating markup. And that's before you start with a browser's quirky sense of humour when it comes to how many things it will render unchallenged.
Here's I'll drop an example below in my response to Jakub...
Jakub Vrána :
Contents of the 'title' attribute is usually free text, for example description of something. And there is absolutely no reason to filter out valid characters on the input, for example in the text McDonald's.
If I would use particular data in the 'href' attribute then it would be my responsibility to filter them because there are some restrictions for the safe contents of this attribute. But your example with javascript:alert(document.cookie); inside the 'title' attribute makes no sense - it is perfectly safe in this context and I can present it to user as is.
My last point is that I don't consider this as a security vulnerability in Zend Framework. That's also why I've presented it on my blog and not inside a non-public security list. It's simply a design decision of Zend Framework - the escape() method is well documented at http://framework.zend.com/manual/en/zend.….scripts.escaping and anybody can deduce that using (or not-using) escape() has the same consequences as using (or not-using) htmlspecialchars(). I've just showed that this design decision leads to harder-to-secure applications built on top of Zend Framework.
This article was written as a reply to the statement that "all PHP frameworks defend against XSS". I've just pointed out that Zend Framework doesn't have any built-in protection against XSS - it offers some tools which you can use to defend against XSS but it's your responsibility. Compare this with the built-in XSS protection in Nette Framework - if you write:
<span title='{$userInput}'>Test</span>
then you can be sure that the output would be XSS-safe regardless of the contents of $userInput.(#d-12982)
#d-12984 reply
Padraic Brady:
>Contents of the 'title' attribute is usually free text, for example description of something. And
>there is absolutely no reason to filter out valid characters on the input, for example in the text
>McDonald's.
You're missing the point - HTML is written in…plain characters. You should be using a contextual filter, or failing that, at least using better HTML markup practices (i.e. double quotes). Any one layer of protection can fail - that's why it is required to have more than one. What would have fixed the issue you describe?
1. Use a HTML sanitiser
2. Enforce double-quoted attributes
3. Escape all single quotes in text
Your example exploit works by eliminating 1 and 2, and falling back to the final defence of 3. Of course that's a vulnerability…in YOUR application side code and markup. Security in depth, my friend. You will be happy though :P. ZF will use ENT_QUOTES shortly to make its safety net effect a bit better.
>If I would use particular data in the 'href' attribute then it would be my responsibility to filter them
>because there are some restrictions for the safe contents of this attribute. But your example
>with javascript:alert(document.cookie); inside the 'title' attribute makes no sense - it is perfectly
>safe in this context and I can present it to user as is.
I wan't targeting the actual title attribute - just giving an example of something that escaping would never stop from getting into your HTML (it would work find in an image src attribute, for example). I can offer quite a few of these, so let's revisit your title attribute in my own contrived example.
As you doubtlessly know, HTML5 allows for…wait for it…quoteless attributes. I love the HTML guys… Any attribute value can be unquoted so long as it applies some rules. Now, let's assume for fun that a user has inserted a user var into a template for an unquoted attribute value. Works for anything, but we'll pretend they were really bad designers and made the title value quoteless. Here's your template:
<!DOCTYPE html>
<head><title>Foobar</title></head>
<body><span title=<?php echo $this->escape($this->userInput); ?>
>Test</span></body>
Now, set $this->userInput to:
x onmouseover=alert(/x/)
Open in Firefox…
If you still think attribute quoting style and escaping are the problem, I'll give you a lot of credit for persistance.
>My last point is that I don't consider this as a security vulnerability in Zend Framework. That's
>also why I've presented it on my blog and not inside a non-public security list. It's simply a
>design decision of Zend Framework - the escape() method is well documented at
>http://framework.zend.com/manual/en/zend.….scripts.escaping and anybody can deduce that
>using (or not-using) escape() has the same consequences as using (or not-using)
>htmlspecialchars(). I've just showed that this design decision leads to harder-to-secure
>applications built on top of Zend Framework.
Appreciated :P. We do get a bit hung up on seeing vulnerabilities being discussed since it's easy for the wrong impression to be given.
>This article was written as a reply to the statement that "all PHP frameworks defend against
>XSS".
Don't look at me - I never say that. Frameworks are great at escaping content - not so much in ensuring what they do escape is XSS or not. That's left to the users to figure out and that's why the focus should be on using better markup, better filters, and being stubbornly paranoid ;).
Jakub Vrána :
Assuming that "any one layer of protection can fail" leads to poor designed applications which are still full of security holes. If any one layer can fail then as a consequence all layers can fail. Read my former article http://php.vrana.cz/common-mistakes-in-se…-applications.php about good ways how to secure web applications.
Single quotes are equally valid as double quotes. Where do you get an impression that using them is a poor practice? Can you point me on some specification telling this?
1. HTML sanitizer makes no sense in this case - there's no HTML in the example and all characters are valid. Stripping random valid characters "just to be sure" leads to bad applications which don't allow me using McDonald's as a name of my company.
2. Enforcing double quotes also makes no sense. Single quotes are valid delimiters of attribute values both in HTML and XML. Requiring them leads to poorly designed applications.
3. This is a valid solution in this case.
The escape() method should provide a valid escaping in all contexts where free text is allowed. That is in between of most tags, in 'title' attribute, in input's 'value' attribute and several different places. Image's 'src' doesn't allow free text - it requires a properly formatted and secure URL.
Yes, I would expect that a good context sensitive escaping function which claims working in HTML will support quoteless attributes. For example this code is perfectly valid and working as expected in XHP:
<span title={$userInput}>Test</span>
Some template engines claim working only in XHTML/XML. The problem of Zend's escape() is that it doesn't support HTML, it doesn't support XML, it supports only subset of valid XML syntax. You can't use escape() function on all places where free text is allowed, only on some of them.
Padraic Brady:
Assuming that any one layer of a defence can fail is called common sense. That's why there are multiple layers to preventing XSS in an application from filtering, to validation, to storing it (properly), all the way to the final act of escaping it. Suggesting that this leads to poor design and security holes (when its purpose is the polar opposite) is being a bit silly.
I never suggested stripping random characters at any point. That would be stupid.
I never suggested using single quotes was forbidden under a HTML standard. That would be stupid too. See below...
Enforcing double quotes for consistency has been typical since htmlspecialchars() never encodes single quotes by default. Ergo, using double quotes... If someone uses single quotes and couldn't RTFM for htmlspecialchars(), they should go back to school and learn to program.
Quoteless attributes make arguments for or against htmlspecialchars() particularly clear - htmlspecialchars is designed to encode a subset of characters with a special meaning in HTML in order to preserve the non-markup content. It was never, ever, not even in Rasmus' gigantic brain, created for the purposes of preventing XSS. That's just a crutch we rely on to supplement all the other stuff we're supposed to be doing BEFORE escaping. For example, it completely ignores all the funny stuff that browsers can render.
Also, I hope that's not Facebook's XHP. Don't get me started on the missing third parameter over there...
Zend's escape() uses htmlspecialchars(). If that's insecure, then the party at fault is whoever writes the C code underpinning htmlspecialchars(). Has this been reported as a PHP bug? Once you elect to go down the htmlspecialchars() there must be something to complement it. I use a HTML Sanitiser and a set of filters - and yes, I use it on plain text. It blocks all the examples I offered earlier.
The problem here is a simple disconnect. You can paint htmlspecialchars() as a saviour and protector against XSS, but that's ridiculous as we both already know. In PHP, it's merely useful for a limited range of XSS values that is completely coincidental to htmlspecialchars() actual purpose (which has zero to do with XSS). The problem is that if people believe it really is a perfect XSS blocker...then there's a serious education problem there.
So it's not that we disagree - we're heatedly arguing the same points, from different perspectives, and not really being productive about it. Maybe we should be - I've dug my editor to see just how far I can push ZF's escape() towards something a bit more comprehensive that it can almost be as secure as people like to think htmlspecialchars() is (even though it isn't).
Juan:
Reading this discussion I'm still not sure if Padraic is really a Zend Developper or if it's some special kind of black humor.
I'll try to put here a clear explanation what is it escaping and why doing it correctly protects us against XSS. Let me start quoting your words:
"It ignores the first line of defence which is to correctly filter and sanitise user input. Any HTML filter or sanitiser would have crushed your example long before it reached htmlspecialchars(). So your vulnerability arises from the missing input validation and NOT from the Zend Framework escaping."
Filter and sanitize user input? Could you please explain for what you will filter it when I want to print the data for example as HTML, XML, JSON and LaTex? You will sanitize it for all of those (which means to damage the data)? And what if I don't know how will I represent the data? What if there's some new language in the future? Some new version of HTML maybe?
Every language/context has some characters with special meaning and the only correct way to represent data with this this language is to correctly escape all characters that have special meaning in the given context. Lets look at a very simple example:
Lets suppose we have two languages - Tiger and Zebra. Tiger has special characters a,b,c and Zebra x,y,z. Escaping in both is made with backslash which has no special meaning itself. Now, user wants to send a message (store to your database) which says "Tiger is awesome and excellent!". Supposing we're using Tiger for the output at the moment, what is your "sanitization" of the input? Filtering a,b,c apparently destroys the message, escaping it would produce "Tiger is \awesome \and excellent!". If we now use Tiger to represent the data, we will get the correct sentence and yes - without XSS.
After a few years of escaping thousands of phrases about our excellent Tiger we want to change the representation to Zebra. Now, as I said, the characters with special meaning in Zebra are x,y and z. Lets precise it - x will kill an elephant, y will kill a guiraffe and z will kill all dinosauruses. Now, changed to Zebra we'll try to print the message. We will start with "Tiger is \awesome \and e..." which is already damaged with some weird slashes and after that - an elephant is dead! Poor elephant! (Apparently some years ago somebody forgot to escape z.)
I thing this practice is really simple and clear and I absolutely don't get this:
"...at least using better HTML markup practices (i.e. double quotes)"
It's like saying "using z in Zebra is not a good practice so we don't need to escape it". (But the dinos are still dead.)
To sum it up, I understand it's just your choice not to protect against XSS automatically. My choice is not to use ZF then.
Padraic Brady:
>To sum it up, I understand it's just your choice not to
>protect against XSS automatically.
It's the same choice that is persistent across most major PHP frameworks, and PHP itself. Since I've already stated I filter the data before it reaches the framework's escaping mechanism, you could try not putting silly words in my mouth. Try a modest bit of common sense - when you escape data, you are filtering it. I just apply a few more filters that are better suited to the context. It doesn't alter the data until it's persistently stored and even then the whole point of escaping is not to lose the information being presented. Let me finish with another quote from me:
"The problem here is a simple disconnect. You can paint htmlspecialchars() as a saviour and protector against XSS, but that's ridiculous as we both already know. In PHP, it's merely useful for a limited range of XSS values that is completely coincidental to htmlspecialchars() actual purpose (which has zero to do with XSS). The problem is that if people believe it really is a perfect XSS blocker...then there's a serious education problem there."
Care to try again?
Juan:
Where are you doing the "filter and sanitize user input"? If it's before printing data, then I misunderstood because "user input" makes impression that it's BEFORE persistence (which would be a huge mistake). If it's on the way out then it's much closer to the correct solution but still - the data has to be correctly escaped IN the given context, not BEFORE - that's a premature escaping, there are some good examples of that in this discussion and I would like to know your answer to this problem.
"It's the same choice that is persistent across most major PHP frameworks, and PHP itself."
The thing that "most of the people/framework do it this way" is not any argument for me. If Zend is developed this way, than it will never be better than the "most of..." and it makes Latte and similar templating engines even more exceptional.
To finish the discussion I assure you that as a web developer I will definitely choose the way of automatic context-aware/context-sensitive escaping. That's not any offence against ZF, that'a an advice to at least think about changing strategy of this problem.
Padraic Brady:
The filtering is done in the templates via custom view helpers (for ZF1), e.g. escapeAttr(). Since we use partial caching, only the dynamic parts of a page suffer from the slightly more complex function performance so it works fine for us. We based the code around ext/mbstring, some simple hex entity conversions, and a bit of sensible testing to cover off on character encoding and invalid bytes. About the only thing we don't bother with are CSS and <script> embedded javascript since those are beyond user tampering (which makes life simpler) - we just kill anyone who puts variables in that context ;). I haven't checked ZF2's views - they only recently turned up with a possible implementation so there's months to go before that gets completed.
If it makes a difference, I'm discussing upping what we do for escaping in ZF2 and I'm checking to see what others have been doing. It was the whole reason why the reported issue to ZF (on this topic) was reopened by me a while ago after being closed by a fellow developer - closing issues raised along these lines just gets stuff swept under the carpet and ignored instead of being openly discussed (or argued about :P, as we've been doing here).
Jakub Vrána :
For example Symfony comes with templating system Twig which looks pretty solid. Nette comes with Latte which is state-of-the-art in templating systems. Show me a PHP framework which does equally poor job in defense against XSS as Zend Framework.(#d-12993)
#d-12997 reply
pajousek:
What about Kohana? :)
Jakub Vrána :
Please specify what exactly do you think we should do BEFORE escaping.
You mentioned HTML sanitization. And I've explained that it makes no sense to HTML sanitize free text. Plus the example I've used doesn't contain any HTML tags at all.
You also mentioned using double quotes instead of single quotes. And I've explained that it is a poor design decision because single quotes are equally valid as double quotes.
The only valid solution is to use an escaping function that is able to escape data correctly in this context. And my suggestion is to provide this escaping function in Zend Framework and make it default as a bonus.
Regarding htmlspecialchars() - it is a low level function and I would expect some higher level function from a framework. Zend Framework provides a higher level method escape() which takes into account encoding but nothing more.
I am horrified about the statement that you "use a HTML Sanitiser and a set of filters" on a plain text. It means that your application unnecessarily strips valid inputs just because you don't know how to properly handle them.
Yes, I am talking about Facebook's XHP. What do you mean by the missing third parameter?
I am far from agreeing that we are arguing the same points from different perspectives. Your suggestions are wrong on many levels.
Please provide a concrete code of your approach to solving this problem.(#d-12990)
#d-12996 reply
Padraic Brady:
>Please specify what exactly do you think we should do BEFORE escaping.
Double quote all attributes, filter/validate user input to appropriate rules, limit variable placement in templates, assasinate anyone who forgets to do those three, then escape, escape, escape.
>You mentioned HTML sanitization. And I've explained that it makes no sense to
>HTML sanitize free text. Plus the example I've used doesn't contain any HTML
>tags at all.
Please don't confuse sanitisation with the group of libraries in PHP that claim to be HTML Sanitisers when they have enough holes to drive a truckload of XSS through them…and then continue to distribute their crap when it's pointed out to them because they are completely clueless self-proclaimed experts who don't care about their users. I wrote blog posts on this topic - nearly all of these libraries need to go extinct.
I use HTMLPurifier and a set of sanitisers that can either strip, tidy or escape any given string depending on the, er, context. Emphasis on "escape".
>You also mentioned using double quotes instead of single quotes. And I've
>explained that it is a poor design decision because single quotes are equally
>valid as double quotes.
Feel free to explain to the HTML specs why double quotes are a poor design decision. The enforcement of double quotes for consistency has the upside of preventing attribute breakout from default htmlspecialchars(). We should quit arguing over one of the few obvious characters - there are plenty more not escaped by html functions in PHP (not even by htmlentities come to think of it).
>The only valid solution is to use an escaping function that is able to escape
>data correctly in this context. And my suggestion is to provide this escaping
>function in Zend Framework and make it default as a bonus.
Which I don't disagree with - since I went to the bother of reopening the ZF issue Tomas reported, debating it internally, determining it wasn't a security vulnerability (due to htmlspecialchars own limitations) and then recommending we look into improving it for ZF2's barely born View layer.
>Regarding htmlspecialchars() - it is a low level function and I would expect
>some higher level function from a framework. Zend Framework provides a
>higher level method escape() which takes into account encoding but nothing
>more.
…because there's a presumption that users will do everything else necessary. Let's call a spade a spade - PHP developers following the idea that htmlspecialchars() does everything are likely riddled with XSS. Isn't that what we're both saying?
>I am horrified about the statement that you "use a HTML Sanitiser and a set of
>filters" on a plain text. It means that your application unnecessarily strips valid
>inputs just because you don't know how to properly handle them.
I should explain a bit better. Given htmlspecialchars() is commonly seen as escaping in PHP, I use HTML sanitisation to refer to everything that happens between data persistance and its escaping into a view. My sanitiser borrows a bit from HTMLPurifier and other sources (e.g. OWASP recommendations) and I briefly mentioned in a previous comment where it only does pure escaping (no stripping of content) via view helpers like escapeAttr().
>Yes, I am talking about Facebook's XHP. What do you mean by the missing
>third parameter?
It's the one that sets a character encoding for htmlspecialchars(). I hear it's considered a good idea to use it.
>I am far from agreeing that we are arguing the same points from different
>perspectives. Your suggestions are wrong on many levels.
And they'll always be wrong until you see what they have in common. If you stopped for a moment to read my last few comments you'd have realised a few things. The first is that ZF1 is an ancient framework - the View layer for ZF2 has literally just seen an initial version for testing. The second is that I don't disagree in any way with contextual escaping by element, attribute, or other. The third is that what you're calling escaping, I call sanitisation and I do it via View Helpers in ZF1 - the terminology is a horror show but if I called it escaping it causes confusion with htmlspecialchars() escaping which is the most common offering by far in PHP. The fourth is that I reopened a closed issue just to ensure this was discussed for ZF rather than swept under the carpet and forgotten.
Lastly, before my typical sarcasm and grumpiness kills me ;) (I am a very grumpy person - sorry!), you should understand that your single post might have triggered some changes for the better. If ZF adopts even a modest change, it will be noticed by a lot of other frameworks and it should get this debate into the wider PHP community. Kudos. Keep up the blogging!
Jakub Vrána :
This is what you should do on filtering input of free text:
- Ensure that the byte sequence is valid in target encoding.
- Ensure that it doesn't contain control characters (\x00 - \x1F), sometimes with exception of \t, \r and \n if you want to allow them.
If you are filtering anything else then you are massacring user input with no reason. Using HTMLPurifier on free text is a mistake.
This is what you should do on printing free text to HTML contexts where free text is allowed (between most of the tags and as a quoted value of some attributes like 'title' or 'value'):
- Convert & to &.
- Convert < to < if outside tags.
- Convert " to " if inside double quoted attribute value.
- Convert ' to ' if inside single quoted attribute value.
If you are escaping anything else then you are doing useless work (assuming that you use Unicode character set because nothing really works in other character sets anyway). The result is that htmlspecialchars() with ENT_QUOTES provides sufficient escaping for HTML contexts where free text is allowed - it actually escapes more characters than necessary.
Forbidding single quotes around attribute values is poor practice because no validator will tell you that you are doing anything wrong.
XHP is able to parse inputs where <, &, ", ', > and several other characters are represented by bytes as in US-ASCII encoding (which equals to most other single bytes encodings and also UTF-8). htmlspecialchars() without specifying encoding understands these bytes as that characters so there is no reason to specifying encoding - nothing would change. XHP doesn't support encodings where these characters are represented by different byte sequences (e.g. UTF-16).
The craziness in ZF2 views is more than 4 months old: https://github.com/zendframework/zf2/commit/e83157d2e98e44c21f2cf8931421d5a3d47fd852. It tells quite something about Zend Framework community that nobody noticed this (or don't care about it) for such a long time regardless it is a development version (beta also for 4 months). And it is really sad that two developers who aren't using ZF at all (me and David Grudl) must point several serious problems in it.
David Grudl :
Another example of meaningless premature escaping:
<?php
$view->key = $key;
$view->arr = array($key => 'Hello');
?>
And template:
<?php
echo $view->arr[$view->key];
?>
Sometimes it writes out "Hello", sometimes it triggers E_NOTICE. Is it intended behavior of serious framework?
Padraic Brady:
Don't forget the forward slash in your escaping... I hear it's used to terminate element tags. I could be wrong - I never did get around to reading the HTML 4.01 spec they released last year.
Jakub Vrána :
Character '/' has no special meaning in HTML contexts where free text is allowed: <span title="/">/</span> is perfectly valid and displays '/' with title '/'.
Padraic Brady:
But only if the attribute is quoted...
You're being selective again. Quoteless attribute values are just as valid as single quotes, which are just as valid as double quotes for HTML 5 - no validator on the planet will notice anything wrong if you're deliberately validating towards a preferred quoting style.
In that respect either you're recommending quotes (which is just as poor and ill advised as double quotes since singles are no more valid than those are apparently), or you're recommending not to filter/escape the attribute values for something that can terminate tags, include javascript comments, fake a space character or act as a javascript string boundary.
While we dance around these details, don't you get the feeling htmlspecialchars() is a little less robust than PHP developers tend to think? How much of the PHP literature ever mentions quoting vs quoteless as being something the average programmer should even pay attention to?
Tharos:
Please read more accurately this Jakub's notice:
"This is what you should do on printing free text to HTML contexts where free text is allowed (between most of the tags and as a quoted value of some attributes like 'title' or 'value'): ..."
Jakub is talking about quoted values of attributes here...
Jakub Vrána :
I've clearly stated that I am writing about HTML contexts where free text is allowed. Quoteless attribute values doesn't allow free text because they are limited to characters [-a-zA-Z0-9._:]. Putting anything else as a value will cause a validation error.
Your rant about JavaScript comments is even more meaningless - they have no special meaning in HTML contexts where free text is allowed (e.g. in 'title' attribute). I never said that htmlspecialchars is sufficient in contexts where free text is not allowed (such as 'onclick' or 'href' attributes).
You clearly have no conception about different contexts and characters which have a special meaning inside of them. This conception is necessary for securing web application.(#d-13006)
#d-13008 reply
Padraic Brady:
You said quoted, not double quoted. If you intended double quoted then I'm more than happy to apologise for being too obscure in my comment. If you intended single quotes, then we're back to my argument being valid since the underlying assumptions are based on htmlspecialchars() using ENT_QUOTES which is almost never consistently applied in PHP (including quite a few frameworks) which is why, in my opinion at least, double quoting is essential to offset the risk of inconsistent escaping.
You can actually validate HTML5 with far more than the free text valid characters you specify - it IS valid to use forward slashes there in a quoteless context...and practically anything else other than a space (or something the browser interprets as a space).
Thirdly, there's no need to get presumptuous about my competence. I'm well aware of context specific escaping. My problem with PHP in general is that we're over reliant on something designed only for escaping HTML element content in the best possible controlled environment. This is rarely reflective of reality. If we wanted near perfect escaping, we'd need to escape a lot more characters and far more careful of context. That would block even XSS from quoteless attribute values.
Again, sorry for not being specific about this in my last comment.
Jakub Vrána :
In HTML 4, <span title='text'> allows free text, <span title="text"> allows free text, <span title=text> doesn't allow free text. In XML, <span title=text> isn't allowed at all. If you are using tools that are not able to properly handle free text in contexts allowing free text (such as ZF) then the tools are wrong.
You are right that quoteless attributes allow free text (including space) in HTML5 - <span title=<&x20;&> is valid. If the tool claims support for HTML5 then it should properly handle also this context. There are however not many tools claiming full support for HTML5.
I've listed all special characters at http://php.vrana.cz/defense-against-xss-in-….php#d-13003. There's nothing else you need to escape in HTML 4 or XML contexts allowing free text.
Alexey:
Latte - nice.
But what do you think about that?
<script>{{variable}}</script>
Diskuse je zrušena z důvodu spamu.