Mercurial > hg > isophonics-drupal-site
comparison vendor/zendframework/zend-escaper/doc/book/theory-of-operation.md @ 0:4c8ae668cc8c
Initial import (non-working)
author | Chris Cannam |
---|---|
date | Wed, 29 Nov 2017 16:09:58 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4c8ae668cc8c |
---|---|
1 # Theory of Operation | |
2 | |
3 zend-escaper provides methods for escaping output data, dependent on the context | |
4 in which the data will be used. Each method is based on peer-reviewed rules and | |
5 is in compliance with the current OWASP recommendations. | |
6 | |
7 The escaping follows a well-known and fixed set of encoding rules defined by | |
8 OWASP for each key HTML context. These rules cannot be impacted or negated by | |
9 browser quirks or edge-case HTML parsing unless the browser suffers a | |
10 catastrophic bug in its HTML parser or Javascript interpreter — both of | |
11 these are unlikely. | |
12 | |
13 The contexts in which zend-escaper should be used are **HTML Body**, **HTML | |
14 Attribute**, **Javascript**, **CSS**, and **URL/URI** contexts. | |
15 | |
16 Every escaper method will take the data to be escaped, make sure it is utf-8 | |
17 encoded data (or try to convert it to utf-8), perform context-based escaping, | |
18 encode the escaped data back to its original encoding, and return the data to | |
19 the caller. | |
20 | |
21 The actual escaping of the data differs between each method; they all have their | |
22 own set of rules according to which escaping is performed. An example will allow | |
23 us to clearly demonstrate the difference, and how the same characters are being | |
24 escaped differently between contexts: | |
25 | |
26 ```php | |
27 $escaper = new Zend\Escaper\Escaper('utf-8'); | |
28 | |
29 // <script>alert("zf2")</script> | |
30 echo $escaper->escapeHtml('<script>alert("zf2")</script>'); | |
31 | |
32 // <script>alert("zf2")</script> | |
33 echo $escaper->escapeHtmlAttr('<script>alert("zf2")</script>'); | |
34 | |
35 // \x3Cscript\x3Ealert\x28\x22zf2\x22\x29\x3C\x2Fscript\x3E | |
36 echo $escaper->escapeJs('<script>alert("zf2")</script>'); | |
37 | |
38 // \3C script\3E alert\28 \22 zf2\22 \29 \3C \2F script\3E | |
39 echo $escaper->escapeCss('<script>alert("zf2")</script>'); | |
40 | |
41 // %3Cscript%3Ealert%28%22zf2%22%29%3C%2Fscript%3E | |
42 echo $escaper->escapeUrl('<script>alert("zf2")</script>'); | |
43 ``` | |
44 | |
45 More detailed examples will be given in later chapters. | |
46 | |
47 ## The Problem with Inconsistent Functionality | |
48 | |
49 At present, programmers orient towards the following PHP functions for each | |
50 common HTML context: | |
51 | |
52 - **HTML Body**: `htmlspecialchars()` or `htmlentities()` | |
53 - **HTML Attribute**: `htmlspecialchars()` or `htmlentities()` | |
54 - **Javascript**: `addslashes()` or `json_encode()` | |
55 - **CSS**: n/a | |
56 - **URL/URI**: `rawurlencode()` or `urlencode()` | |
57 | |
58 In practice, these decisions appear to depend more on what PHP offers, and if it | |
59 can be interpreted as offering sufficient escaping safety, than it does on what | |
60 is recommended in reality to defend against XSS. While these functions can | |
61 prevent some forms of XSS, they do not cover all use cases or risks and are | |
62 therefore insufficient defenses. | |
63 | |
64 Using `htmlspecialchars()` in a perfectly valid HTML5 unquoted attribute value, | |
65 for example, is completely useless since the value can be terminated by a space | |
66 (among other things), which is never escaped. Thus, in this instance, we have a | |
67 conflict between a widely used HTML escaper and a modern HTML specification, | |
68 with no specific function available to cover this use case. While it's tempting | |
69 to blame users, or the HTML specification authors, escaping just needs to deal | |
70 with whatever HTML and browsers allow. | |
71 | |
72 Using `addslashes()`, custom backslash escaping, or `json_encode()` will | |
73 typically ignore HTML special characters such as ampersands, which may be used | |
74 to inject entities into Javascript. Under the right circumstances, the browser | |
75 will convert these entities into their literal equivalents before interpreting | |
76 Javascript, thus allowing attackers to inject arbitrary code. | |
77 | |
78 Inconsistencies with valid HTML, insecure default parameters, lack of character | |
79 encoding awareness, and misrepresentations of what functions are capable of by | |
80 some programmers — these all make escaping in PHP an unnecessarily | |
81 convoluted quest. | |
82 | |
83 To circumvent the lack of escaping methods in PHP, zend-escaper addresses the | |
84 need to apply context-specific escaping in web applications. It implements | |
85 methods that specifically target XSS and offers programmers a tool to secure | |
86 their applications without misusing other inadequate methods, or using, most | |
87 likely incomplete, home-grown solutions. | |
88 | |
89 ## Why Contextual Escaping? | |
90 | |
91 To understand why multiple standardised escaping methods are needed, what | |
92 follows are several quick points; they are by no means a complete set of | |
93 reasons, however! | |
94 | |
95 ### HTML escaping of unquoted HTML attribute values still allows XSS | |
96 | |
97 This is probably the best known way to defeat `htmlspecialchars()` when used on | |
98 attribute values, since any space (or character interpreted as a space — | |
99 there are a lot) lets you inject new attributes whose content can't be | |
100 neutralised by HTML escaping. The solution (where this is possible) is | |
101 additional escaping as defined by the OWASP ESAPI codecs. The point here can be | |
102 extended further — escaping only works if a programmer or designer knows | |
103 what they're doing. In many contexts, there are additional practices and gotchas | |
104 that need to be carefully monitored since escaping sometimes needs a little | |
105 extra help to protect against XSS — even if that means ensuring all | |
106 attribute values are properly double quoted despite this not being required for | |
107 valid HTML. | |
108 | |
109 ### HTML escaping of CSS, Javascript or URIs is often reversed when passed to non-HTML interpreters by the browser | |
110 | |
111 HTML escaping is just that &mdsash; it's designed to escape a string for HTML | |
112 (i.e. prevent tag or attribute insertion), but not alter the underlying meaning | |
113 of the content, whether it be text, Javascript, CSS, or URIs. For that purpose, | |
114 a fully HTML-escaped version of any other context may still have its unescaped | |
115 form extracted before it's interpreted or executed. For this reason we need | |
116 separate escapers for Javascript, CSS, and URIs, and developers or designers | |
117 writing templates **must** know which escaper to apply to which context. Of | |
118 course, this means you need to be able to identify the correct context before | |
119 selecting the right escaper! | |
120 | |
121 ### DOM-based XSS requires a defence using at least two levels of different escaping in many cases | |
122 | |
123 DOM-based XSS has become increasingly common as Javascript has taken off in | |
124 popularity for large scale client-side coding. A simple example is Javascript | |
125 defined in a template which inserts a new piece of HTML text into the DOM. If | |
126 the string is only HTML escaped, it may still contain Javascript that will | |
127 execute in that context. If the string is only Javascript-escaped, it may | |
128 contain HTML markup (new tags and attributes) which will be injected into the | |
129 DOM and parsed once the inserting Javascript executes. Damned either way? The | |
130 solution is to escape twice — first escape the string for HTML (make it | |
131 safe for DOM insertion), and then for Javascript (make it safe for the current | |
132 Javascript context). Nested contexts are a common means of bypassing naive | |
133 escaping habits (e.g. you can inject Javascript into a CSS expression within an | |
134 HTML attribute). | |
135 | |
136 ### PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes) | |
137 | |
138 A simple example, widely used, is when you see `json_encode()` used to escape | |
139 Javascript, or worse, some kind of mutant `addslashes()` implementation. These | |
140 were never designed to eliminate XSS, yet PHP programmers use them as such. For | |
141 example, `json_encode()` does not escape the ampersand or semi-colon characters | |
142 by default. That means you can easily inject HTML entities which could then be | |
143 decoded before the Javascript is evaluated in a HTML document. This lets you | |
144 break out of strings, add new JS statements, close tags, etc. In other words, | |
145 using `json_encode()` is insufficient and naive. The same, arguably, could be | |
146 said for `htmlspecialchars()` which has its own well known limitations that make | |
147 a singular reliance on it a questionable practice. |