isophonics-drupal-site: vendor/zendframework/zend-escaper/doc/book/theory-of-operation.md annotate

annotate vendor/zendframework/zend-escaper/doc/book/theory-of-operation.md @ 8:50b0d041100e

Further files for download

author	Chris Cannam
date	Mon, 05 Feb 2018 10:56:40 +0000
parents	4c8ae668cc8c
children

rev	line source
Chris@0	1 # Theory of Operation
Chris@0	2
Chris@0	3 zend-escaper provides methods for escaping output data, dependent on the context
Chris@0	4 in which the data will be used. Each method is based on peer-reviewed rules and
Chris@0	5 is in compliance with the current OWASP recommendations.
Chris@0	6
Chris@0	7 The escaping follows a well-known and fixed set of encoding rules defined by
Chris@0	8 OWASP for each key HTML context. These rules cannot be impacted or negated by
Chris@0	9 browser quirks or edge-case HTML parsing unless the browser suffers a
Chris@0	10 catastrophic bug in its HTML parser or Javascript interpreter — both of
Chris@0	11 these are unlikely.
Chris@0	12
Chris@0	13 The contexts in which zend-escaper should be used are HTML Body, **HTML
Chris@0	14 Attribute, Javascript, CSS, and URL/URI** contexts.
Chris@0	15
Chris@0	16 Every escaper method will take the data to be escaped, make sure it is utf-8
Chris@0	17 encoded data (or try to convert it to utf-8), perform context-based escaping,
Chris@0	18 encode the escaped data back to its original encoding, and return the data to
Chris@0	19 the caller.
Chris@0	20
Chris@0	21 The actual escaping of the data differs between each method; they all have their
Chris@0	22 own set of rules according to which escaping is performed. An example will allow
Chris@0	23 us to clearly demonstrate the difference, and how the same characters are being
Chris@0	24 escaped differently between contexts:
Chris@0	25
Chris@0	26 ```php
Chris@0	27 $escaper = new Zend\Escaper\Escaper('utf-8');
Chris@0	28
Chris@0	29 // <script>alert("zf2")</script>
Chris@0	30 echo $escaper->escapeHtml('<script>alert("zf2")</script>');
Chris@0	31
Chris@0	32 // <script>alert("zf2")</script>
Chris@0	33 echo $escaper->escapeHtmlAttr('<script>alert("zf2")</script>');
Chris@0	34
Chris@0	35 // \x3Cscript\x3Ealert\x28\x22zf2\x22\x29\x3C\x2Fscript\x3E
Chris@0	36 echo $escaper->escapeJs('<script>alert("zf2")</script>');
Chris@0	37
Chris@0	38 // \3C script\3E alert\28 \22 zf2\22 \29 \3C \2F script\3E
Chris@0	39 echo $escaper->escapeCss('<script>alert("zf2")</script>');
Chris@0	40
Chris@0	41 // %3Cscript%3Ealert%28%22zf2%22%29%3C%2Fscript%3E
Chris@0	42 echo $escaper->escapeUrl('<script>alert("zf2")</script>');
Chris@0	43 ```
Chris@0	44
Chris@0	45 More detailed examples will be given in later chapters.
Chris@0	46
Chris@0	47 ## The Problem with Inconsistent Functionality
Chris@0	48
Chris@0	49 At present, programmers orient towards the following PHP functions for each
Chris@0	50 common HTML context:
Chris@0	51
Chris@0	52 - HTML Body: `htmlspecialchars()` or `htmlentities()`
Chris@0	53 - HTML Attribute: `htmlspecialchars()` or `htmlentities()`
Chris@0	54 - Javascript: `addslashes()` or `json_encode()`
Chris@0	55 - CSS: n/a
Chris@0	56 - URL/URI: `rawurlencode()` or `urlencode()`
Chris@0	57
Chris@0	58 In practice, these decisions appear to depend more on what PHP offers, and if it
Chris@0	59 can be interpreted as offering sufficient escaping safety, than it does on what
Chris@0	60 is recommended in reality to defend against XSS. While these functions can
Chris@0	61 prevent some forms of XSS, they do not cover all use cases or risks and are
Chris@0	62 therefore insufficient defenses.
Chris@0	63
Chris@0	64 Using `htmlspecialchars()` in a perfectly valid HTML5 unquoted attribute value,
Chris@0	65 for example, is completely useless since the value can be terminated by a space
Chris@0	66 (among other things), which is never escaped. Thus, in this instance, we have a
Chris@0	67 conflict between a widely used HTML escaper and a modern HTML specification,
Chris@0	68 with no specific function available to cover this use case. While it's tempting
Chris@0	69 to blame users, or the HTML specification authors, escaping just needs to deal
Chris@0	70 with whatever HTML and browsers allow.
Chris@0	71
Chris@0	72 Using `addslashes()`, custom backslash escaping, or `json_encode()` will
Chris@0	73 typically ignore HTML special characters such as ampersands, which may be used
Chris@0	74 to inject entities into Javascript. Under the right circumstances, the browser
Chris@0	75 will convert these entities into their literal equivalents before interpreting
Chris@0	76 Javascript, thus allowing attackers to inject arbitrary code.
Chris@0	77
Chris@0	78 Inconsistencies with valid HTML, insecure default parameters, lack of character
Chris@0	79 encoding awareness, and misrepresentations of what functions are capable of by
Chris@0	80 some programmers — these all make escaping in PHP an unnecessarily
Chris@0	81 convoluted quest.
Chris@0	82
Chris@0	83 To circumvent the lack of escaping methods in PHP, zend-escaper addresses the
Chris@0	84 need to apply context-specific escaping in web applications. It implements
Chris@0	85 methods that specifically target XSS and offers programmers a tool to secure
Chris@0	86 their applications without misusing other inadequate methods, or using, most
Chris@0	87 likely incomplete, home-grown solutions.
Chris@0	88
Chris@0	89 ## Why Contextual Escaping?
Chris@0	90
Chris@0	91 To understand why multiple standardised escaping methods are needed, what
Chris@0	92 follows are several quick points; they are by no means a complete set of
Chris@0	93 reasons, however!
Chris@0	94
Chris@0	95 ### HTML escaping of unquoted HTML attribute values still allows XSS
Chris@0	96
Chris@0	97 This is probably the best known way to defeat `htmlspecialchars()` when used on
Chris@0	98 attribute values, since any space (or character interpreted as a space —
Chris@0	99 there are a lot) lets you inject new attributes whose content can't be
Chris@0	100 neutralised by HTML escaping. The solution (where this is possible) is
Chris@0	101 additional escaping as defined by the OWASP ESAPI codecs. The point here can be
Chris@0	102 extended further — escaping only works if a programmer or designer knows
Chris@0	103 what they're doing. In many contexts, there are additional practices and gotchas
Chris@0	104 that need to be carefully monitored since escaping sometimes needs a little
Chris@0	105 extra help to protect against XSS — even if that means ensuring all
Chris@0	106 attribute values are properly double quoted despite this not being required for
Chris@0	107 valid HTML.
Chris@0	108
Chris@0	109 ### HTML escaping of CSS, Javascript or URIs is often reversed when passed to non-HTML interpreters by the browser
Chris@0	110
Chris@0	111 HTML escaping is just that &mdsash; it's designed to escape a string for HTML
Chris@0	112 (i.e. prevent tag or attribute insertion), but not alter the underlying meaning
Chris@0	113 of the content, whether it be text, Javascript, CSS, or URIs. For that purpose,
Chris@0	114 a fully HTML-escaped version of any other context may still have its unescaped
Chris@0	115 form extracted before it's interpreted or executed. For this reason we need
Chris@0	116 separate escapers for Javascript, CSS, and URIs, and developers or designers
Chris@0	117 writing templates must know which escaper to apply to which context. Of
Chris@0	118 course, this means you need to be able to identify the correct context before
Chris@0	119 selecting the right escaper!
Chris@0	120
Chris@0	121 ### DOM-based XSS requires a defence using at least two levels of different escaping in many cases
Chris@0	122
Chris@0	123 DOM-based XSS has become increasingly common as Javascript has taken off in
Chris@0	124 popularity for large scale client-side coding. A simple example is Javascript
Chris@0	125 defined in a template which inserts a new piece of HTML text into the DOM. If
Chris@0	126 the string is only HTML escaped, it may still contain Javascript that will
Chris@0	127 execute in that context. If the string is only Javascript-escaped, it may
Chris@0	128 contain HTML markup (new tags and attributes) which will be injected into the
Chris@0	129 DOM and parsed once the inserting Javascript executes. Damned either way? The
Chris@0	130 solution is to escape twice — first escape the string for HTML (make it
Chris@0	131 safe for DOM insertion), and then for Javascript (make it safe for the current
Chris@0	132 Javascript context). Nested contexts are a common means of bypassing naive
Chris@0	133 escaping habits (e.g. you can inject Javascript into a CSS expression within an
Chris@0	134 HTML attribute).
Chris@0	135
Chris@0	136 ### PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes)
Chris@0	137
Chris@0	138 A simple example, widely used, is when you see `json_encode()` used to escape
Chris@0	139 Javascript, or worse, some kind of mutant `addslashes()` implementation. These
Chris@0	140 were never designed to eliminate XSS, yet PHP programmers use them as such. For
Chris@0	141 example, `json_encode()` does not escape the ampersand or semi-colon characters
Chris@0	142 by default. That means you can easily inject HTML entities which could then be
Chris@0	143 decoded before the Javascript is evaluated in a HTML document. This lets you
Chris@0	144 break out of strings, add new JS statements, close tags, etc. In other words,
Chris@0	145 using `json_encode()` is insufficient and naive. The same, arguably, could be
Chris@0	146 said for `htmlspecialchars()` which has its own well known limitations that make
Chris@0	147 a singular reliance on it a questionable practice.

Mercurial > hg > isophonics-drupal-site

annotate vendor/zendframework/zend-escaper/doc/book/theory-of-operation.md @ 8:50b0d041100e