Chris@0
|
1 # Zend\\Feed\\Reader and Security
|
Chris@0
|
2
|
Chris@0
|
3 As with any data coming from a source that is beyond the developer's control,
|
Chris@0
|
4 special attention needs to be given to securing, validating and filtering that
|
Chris@0
|
5 data. Similar to data input to our application by users, data coming from RSS
|
Chris@0
|
6 and Atom feeds should also be considered unsafe and potentially dangerous, as it
|
Chris@0
|
7 allows the delivery of HTML and [xHTML](http://tools.ietf.org/html/rfc4287#section-8.1).
|
Chris@0
|
8 Because data validation and filtration is out of `Zend\Feed`'s scope, this task
|
Chris@0
|
9 is left for implementation by the developer, by using libraries such as
|
Chris@0
|
10 zend-escaper for escaping and [HTMLPurifier](http://www.htmlpurifier.org/) for
|
Chris@0
|
11 validating and filtering feed data.
|
Chris@0
|
12
|
Chris@0
|
13 Escaping and filtering of potentially insecure data is highly recommended before
|
Chris@0
|
14 outputting it anywhere in our application or before storing that data in some
|
Chris@0
|
15 storage engine (be it a simple file or a database.).
|
Chris@0
|
16
|
Chris@0
|
17 ## Filtering data using HTMLPurifier
|
Chris@0
|
18
|
Chris@0
|
19 Currently, the best available library for filtering and validating (x)HTML data
|
Chris@0
|
20 in PHP is [HTMLPurifier](http://www.htmlpurifier.org/), and, as such, is the
|
Chris@0
|
21 recommended tool for this task. HTMLPurifier works by filtering out all (x)HTML
|
Chris@0
|
22 from the data, except for the tags and attributes specifically allowed in a
|
Chris@0
|
23 whitelist, and by checking and fixing nesting of tags, ensuring
|
Chris@0
|
24 standards-compliant output.
|
Chris@0
|
25
|
Chris@0
|
26 The following examples will show a basic usage of HTMLPurifier, but developers
|
Chris@0
|
27 are urged to go through and read [HTMLPurifier's documentation](http://www.htmlpurifier.org/docs).
|
Chris@0
|
28
|
Chris@0
|
29 ```php
|
Chris@0
|
30 // Setting HTMLPurifier's options
|
Chris@0
|
31 $options = [
|
Chris@0
|
32 // Allow only paragraph tags
|
Chris@0
|
33 // and anchor tags wit the href attribute
|
Chris@0
|
34 [
|
Chris@0
|
35 'HTML.Allowed',
|
Chris@0
|
36 'p,a[href]'
|
Chris@0
|
37 ],
|
Chris@0
|
38 // Format end output with Tidy
|
Chris@0
|
39 [
|
Chris@0
|
40 'Output.TidyFormat',
|
Chris@0
|
41 true
|
Chris@0
|
42 ],
|
Chris@0
|
43 // Assume XHTML 1.0 Strict Doctype
|
Chris@0
|
44 [
|
Chris@0
|
45 'HTML.Doctype',
|
Chris@0
|
46 'XHTML 1.0 Strict'
|
Chris@0
|
47 ],
|
Chris@0
|
48 // Disable cache, but see note after the example
|
Chris@0
|
49 [
|
Chris@0
|
50 'Cache.DefinitionImpl',
|
Chris@0
|
51 null
|
Chris@0
|
52 ]
|
Chris@0
|
53 ];
|
Chris@0
|
54
|
Chris@0
|
55 // Configuring HTMLPurifier
|
Chris@0
|
56 $config = HTMLPurifier_Config::createDefault();
|
Chris@0
|
57 foreach ($options as $option) {
|
Chris@0
|
58 $config->set($option[0], $option[1]);
|
Chris@0
|
59 }
|
Chris@0
|
60
|
Chris@0
|
61 // Creating a HTMLPurifier with it's config
|
Chris@0
|
62 $purifier = new HTMLPurifier($config);
|
Chris@0
|
63
|
Chris@0
|
64 // Fetch the RSS
|
Chris@0
|
65 try {
|
Chris@0
|
66 $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/');
|
Chris@0
|
67 } catch (Zend\Feed\Exception\Reader\RuntimeException $e) {
|
Chris@0
|
68 // feed import failed
|
Chris@0
|
69 echo "Exception caught importing feed: {$e->getMessage()}\n";
|
Chris@0
|
70 exit;
|
Chris@0
|
71 }
|
Chris@0
|
72
|
Chris@0
|
73 // Initialize the channel data array
|
Chris@0
|
74 // See that we're cleaning the description with HTMLPurifier
|
Chris@0
|
75 $channel = [
|
Chris@0
|
76 'title' => $rss->getTitle(),
|
Chris@0
|
77 'link' => $rss->getLink(),
|
Chris@0
|
78 'description' => $purifier->purify($rss->getDescription()),
|
Chris@0
|
79 'items' => [],
|
Chris@0
|
80 ];
|
Chris@0
|
81
|
Chris@0
|
82 // Loop over each channel item and store relevant data
|
Chris@0
|
83 // See that we're cleaning the descriptions with HTMLPurifier
|
Chris@0
|
84 foreach ($rss as $item) {
|
Chris@0
|
85 $channel['items'][] = [
|
Chris@0
|
86 'title' => $item->getTitle(),
|
Chris@0
|
87 'link' => $item->getLink(),
|
Chris@0
|
88 'description' => $purifier->purify($item->getDescription()),
|
Chris@0
|
89 ];
|
Chris@0
|
90 }
|
Chris@0
|
91 ```
|
Chris@0
|
92
|
Chris@0
|
93 > ### Tidy is required
|
Chris@0
|
94 >
|
Chris@0
|
95 > HTMLPurifier is using the PHP [Tidy extension](http://php.net/tidy) to clean
|
Chris@0
|
96 > and repair the final output. If this extension is not available, it will
|
Chris@0
|
97 > silently fail, but its availability has no impact on the library's security.
|
Chris@0
|
98
|
Chris@0
|
99 > ### Caching
|
Chris@0
|
100 >
|
Chris@0
|
101 > For the sake of this example, the HTMLPurifier's cache is disabled, but it is
|
Chris@0
|
102 > recommended to configure caching and use its standalone include file as it can
|
Chris@0
|
103 > improve the performance of HTMLPurifier substantially.
|
Chris@0
|
104
|
Chris@0
|
105 ## Escaping data using zend-escaper
|
Chris@0
|
106
|
Chris@0
|
107 To help prevent XSS attacks, Zend Framework provides the [zend-escaper component](https://github.com/zendframework/zend-escaper),
|
Chris@0
|
108 which complies to the current [OWASP recommendations](https://www.owasp.org/index.php/XSS_Prevention_Cheat_Sheet),
|
Chris@0
|
109 and as such, is the recommended tool for escaping HTML tags and attributes,
|
Chris@0
|
110 Javascript, CSS and URLs before outputing any potentially insecure data to the
|
Chris@0
|
111 users.
|
Chris@0
|
112
|
Chris@0
|
113 ```php
|
Chris@0
|
114 try {
|
Chris@0
|
115 $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/');
|
Chris@0
|
116 } catch (Zend\Feed\Exception\Reader\RuntimeException $e) {
|
Chris@0
|
117 // feed import failed
|
Chris@0
|
118 echo "Exception caught importing feed: {$e->getMessage()}\n";
|
Chris@0
|
119 exit;
|
Chris@0
|
120 }
|
Chris@0
|
121
|
Chris@0
|
122 // Validate all URIs
|
Chris@0
|
123 $linkValidator = new Zend\Validator\Uri;
|
Chris@0
|
124 $link = null;
|
Chris@0
|
125 if ($linkValidator->isValid($rss->getLink())) {
|
Chris@0
|
126 $link = $rss->getLink();
|
Chris@0
|
127 }
|
Chris@0
|
128
|
Chris@0
|
129 // Escaper used for escaping data
|
Chris@0
|
130 $escaper = new Zend\Escaper\Escaper('utf-8');
|
Chris@0
|
131
|
Chris@0
|
132 // Initialize the channel data array
|
Chris@0
|
133 $channel = [
|
Chris@0
|
134 'title' => $escaper->escapeHtml($rss->getTitle()),
|
Chris@0
|
135 'link' => $escaper->escapeUrl($link),
|
Chris@0
|
136 'description' => $escaper->escapeHtml($rss->getDescription()),
|
Chris@0
|
137 'items' => [],
|
Chris@0
|
138 ];
|
Chris@0
|
139
|
Chris@0
|
140 // Loop over each channel item and store relevant data
|
Chris@0
|
141 foreach ($rss as $item) {
|
Chris@0
|
142 $link = null;
|
Chris@0
|
143 if ($linkValidator->isValid($rss->getLink())) {
|
Chris@0
|
144 $link = $item->getLink();
|
Chris@0
|
145 }
|
Chris@0
|
146 $channel['items'][] = [
|
Chris@0
|
147 'title' => $escaper->escapeHtml($item->getTitle()),
|
Chris@0
|
148 'link' => $escaper->escapeUrl($link),
|
Chris@0
|
149 'description' => $escaper->escapeHtml($item->getDescription()),
|
Chris@0
|
150 ];
|
Chris@0
|
151 }
|
Chris@0
|
152 ```
|
Chris@0
|
153
|
Chris@0
|
154 The feed data is now safe for output to HTML templates. You can, of course, skip
|
Chris@0
|
155 escaping when simply storing the data persistently, but remember to escape it on
|
Chris@0
|
156 output later!
|
Chris@0
|
157
|
Chris@0
|
158 Of course, these are just basic examples, and cannot cover all possible
|
Chris@0
|
159 scenarios that you, as a developer, can, and most likely will, encounter. Your
|
Chris@0
|
160 responsibility is to learn what libraries and tools are at your disposal, and
|
Chris@0
|
161 when and how to use them to secure your web applications.
|