Chris@0: # Zend\\Feed\\Reader and Security Chris@0: Chris@0: As with any data coming from a source that is beyond the developer's control, Chris@0: special attention needs to be given to securing, validating and filtering that Chris@0: data. Similar to data input to our application by users, data coming from RSS Chris@0: and Atom feeds should also be considered unsafe and potentially dangerous, as it Chris@0: allows the delivery of HTML and [xHTML](http://tools.ietf.org/html/rfc4287#section-8.1). Chris@0: Because data validation and filtration is out of `Zend\Feed`'s scope, this task Chris@0: is left for implementation by the developer, by using libraries such as Chris@0: zend-escaper for escaping and [HTMLPurifier](http://www.htmlpurifier.org/) for Chris@0: validating and filtering feed data. Chris@0: Chris@0: Escaping and filtering of potentially insecure data is highly recommended before Chris@0: outputting it anywhere in our application or before storing that data in some Chris@0: storage engine (be it a simple file or a database.). Chris@0: Chris@0: ## Filtering data using HTMLPurifier Chris@0: Chris@0: Currently, the best available library for filtering and validating (x)HTML data Chris@0: in PHP is [HTMLPurifier](http://www.htmlpurifier.org/), and, as such, is the Chris@0: recommended tool for this task. HTMLPurifier works by filtering out all (x)HTML Chris@0: from the data, except for the tags and attributes specifically allowed in a Chris@0: whitelist, and by checking and fixing nesting of tags, ensuring Chris@0: standards-compliant output. Chris@0: Chris@0: The following examples will show a basic usage of HTMLPurifier, but developers Chris@0: are urged to go through and read [HTMLPurifier's documentation](http://www.htmlpurifier.org/docs). Chris@0: Chris@0: ```php Chris@0: // Setting HTMLPurifier's options Chris@0: $options = [ Chris@0: // Allow only paragraph tags Chris@0: // and anchor tags wit the href attribute Chris@0: [ Chris@0: 'HTML.Allowed', Chris@0: 'p,a[href]' Chris@0: ], Chris@0: // Format end output with Tidy Chris@0: [ Chris@0: 'Output.TidyFormat', Chris@0: true Chris@0: ], Chris@0: // Assume XHTML 1.0 Strict Doctype Chris@0: [ Chris@0: 'HTML.Doctype', Chris@0: 'XHTML 1.0 Strict' Chris@0: ], Chris@0: // Disable cache, but see note after the example Chris@0: [ Chris@0: 'Cache.DefinitionImpl', Chris@0: null Chris@0: ] Chris@0: ]; Chris@0: Chris@0: // Configuring HTMLPurifier Chris@0: $config = HTMLPurifier_Config::createDefault(); Chris@0: foreach ($options as $option) { Chris@0: $config->set($option[0], $option[1]); Chris@0: } Chris@0: Chris@0: // Creating a HTMLPurifier with it's config Chris@0: $purifier = new HTMLPurifier($config); Chris@0: Chris@0: // Fetch the RSS Chris@0: try { Chris@0: $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/'); Chris@0: } catch (Zend\Feed\Exception\Reader\RuntimeException $e) { Chris@0: // feed import failed Chris@0: echo "Exception caught importing feed: {$e->getMessage()}\n"; Chris@0: exit; Chris@0: } Chris@0: Chris@0: // Initialize the channel data array Chris@0: // See that we're cleaning the description with HTMLPurifier Chris@0: $channel = [ Chris@0: 'title' => $rss->getTitle(), Chris@0: 'link' => $rss->getLink(), Chris@0: 'description' => $purifier->purify($rss->getDescription()), Chris@0: 'items' => [], Chris@0: ]; Chris@0: Chris@0: // Loop over each channel item and store relevant data Chris@0: // See that we're cleaning the descriptions with HTMLPurifier Chris@0: foreach ($rss as $item) { Chris@0: $channel['items'][] = [ Chris@0: 'title' => $item->getTitle(), Chris@0: 'link' => $item->getLink(), Chris@0: 'description' => $purifier->purify($item->getDescription()), Chris@0: ]; Chris@0: } Chris@0: ``` Chris@0: Chris@0: > ### Tidy is required Chris@0: > Chris@0: > HTMLPurifier is using the PHP [Tidy extension](http://php.net/tidy) to clean Chris@0: > and repair the final output. If this extension is not available, it will Chris@0: > silently fail, but its availability has no impact on the library's security. Chris@0: Chris@0: > ### Caching Chris@0: > Chris@0: > For the sake of this example, the HTMLPurifier's cache is disabled, but it is Chris@0: > recommended to configure caching and use its standalone include file as it can Chris@0: > improve the performance of HTMLPurifier substantially. Chris@0: Chris@0: ## Escaping data using zend-escaper Chris@0: Chris@0: To help prevent XSS attacks, Zend Framework provides the [zend-escaper component](https://github.com/zendframework/zend-escaper), Chris@0: which complies to the current [OWASP recommendations](https://www.owasp.org/index.php/XSS_Prevention_Cheat_Sheet), Chris@0: and as such, is the recommended tool for escaping HTML tags and attributes, Chris@0: Javascript, CSS and URLs before outputing any potentially insecure data to the Chris@0: users. Chris@0: Chris@0: ```php Chris@0: try { Chris@0: $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/'); Chris@0: } catch (Zend\Feed\Exception\Reader\RuntimeException $e) { Chris@0: // feed import failed Chris@0: echo "Exception caught importing feed: {$e->getMessage()}\n"; Chris@0: exit; Chris@0: } Chris@0: Chris@0: // Validate all URIs Chris@0: $linkValidator = new Zend\Validator\Uri; Chris@0: $link = null; Chris@0: if ($linkValidator->isValid($rss->getLink())) { Chris@0: $link = $rss->getLink(); Chris@0: } Chris@0: Chris@0: // Escaper used for escaping data Chris@0: $escaper = new Zend\Escaper\Escaper('utf-8'); Chris@0: Chris@0: // Initialize the channel data array Chris@0: $channel = [ Chris@0: 'title' => $escaper->escapeHtml($rss->getTitle()), Chris@0: 'link' => $escaper->escapeUrl($link), Chris@0: 'description' => $escaper->escapeHtml($rss->getDescription()), Chris@0: 'items' => [], Chris@0: ]; Chris@0: Chris@0: // Loop over each channel item and store relevant data Chris@0: foreach ($rss as $item) { Chris@0: $link = null; Chris@0: if ($linkValidator->isValid($rss->getLink())) { Chris@0: $link = $item->getLink(); Chris@0: } Chris@0: $channel['items'][] = [ Chris@0: 'title' => $escaper->escapeHtml($item->getTitle()), Chris@0: 'link' => $escaper->escapeUrl($link), Chris@0: 'description' => $escaper->escapeHtml($item->getDescription()), Chris@0: ]; Chris@0: } Chris@0: ``` Chris@0: Chris@0: The feed data is now safe for output to HTML templates. You can, of course, skip Chris@0: escaping when simply storing the data persistently, but remember to escape it on Chris@0: output later! Chris@0: Chris@0: Of course, these are just basic examples, and cannot cover all possible Chris@0: scenarios that you, as a developer, can, and most likely will, encounter. Your Chris@0: responsibility is to learn what libraries and tools are at your disposal, and Chris@0: when and how to use them to secure your web applications.