Chris@0: # Zend\\Feed\\Reader and Security
Chris@0: 
Chris@0: As with any data coming from a source that is beyond the developer's control,
Chris@0: special attention needs to be given to securing, validating and filtering that
Chris@0: data. Similar to data input to our application by users, data coming from RSS
Chris@0: and Atom feeds should also be considered unsafe and potentially dangerous, as it
Chris@0: allows the delivery of HTML and [xHTML](http://tools.ietf.org/html/rfc4287#section-8.1).
Chris@0: Because data validation and filtration is out of `Zend\Feed`'s scope, this task
Chris@0: is left for implementation by the developer, by using libraries such as
Chris@0: zend-escaper for escaping and [HTMLPurifier](http://www.htmlpurifier.org/) for
Chris@0: validating and filtering feed data.
Chris@0: 
Chris@0: Escaping and filtering of potentially insecure data is highly recommended before
Chris@0: outputting it anywhere in our application or before storing that data in some
Chris@0: storage engine (be it a simple file or a database.).
Chris@0: 
Chris@0: ## Filtering data using HTMLPurifier
Chris@0: 
Chris@0: Currently, the best available library for filtering and validating (x)HTML data
Chris@0: in PHP is [HTMLPurifier](http://www.htmlpurifier.org/), and, as such, is the
Chris@0: recommended tool for this task.  HTMLPurifier works by filtering out all (x)HTML
Chris@0: from the data, except for the tags and attributes specifically allowed in a
Chris@0: whitelist, and by checking and fixing nesting of tags, ensuring
Chris@0: standards-compliant output.
Chris@0: 
Chris@0: The following examples will show a basic usage of HTMLPurifier, but developers
Chris@0: are urged to go through and read [HTMLPurifier's documentation](http://www.htmlpurifier.org/docs).
Chris@0: 
Chris@0: ```php
Chris@0: // Setting HTMLPurifier's options
Chris@0: $options = [
Chris@0:     // Allow only paragraph tags
Chris@0:     // and anchor tags wit the href attribute
Chris@0:     [
Chris@0:         'HTML.Allowed',
Chris@0:         'p,a[href]'
Chris@0:     ],
Chris@0:     // Format end output with Tidy
Chris@0:     [
Chris@0:         'Output.TidyFormat',
Chris@0:         true
Chris@0:     ],
Chris@0:     // Assume XHTML 1.0 Strict Doctype
Chris@0:     [
Chris@0:         'HTML.Doctype',
Chris@0:         'XHTML 1.0 Strict'
Chris@0:     ],
Chris@0:     // Disable cache, but see note after the example
Chris@0:     [
Chris@0:         'Cache.DefinitionImpl',
Chris@0:         null
Chris@0:     ]
Chris@0: ];
Chris@0: 
Chris@0: // Configuring HTMLPurifier
Chris@0: $config = HTMLPurifier_Config::createDefault();
Chris@0: foreach ($options as $option) {
Chris@0:     $config->set($option[0], $option[1]);
Chris@0: }
Chris@0: 
Chris@0: // Creating a HTMLPurifier with it's config
Chris@0: $purifier = new HTMLPurifier($config);
Chris@0: 
Chris@0: // Fetch the RSS
Chris@0: try {
Chris@0:    $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/');
Chris@0: } catch (Zend\Feed\Exception\Reader\RuntimeException $e) {
Chris@0:    // feed import failed
Chris@0:    echo "Exception caught importing feed: {$e->getMessage()}\n";
Chris@0:    exit;
Chris@0: }
Chris@0: 
Chris@0: // Initialize the channel data array
Chris@0: // See that we're cleaning the description with HTMLPurifier
Chris@0: $channel = [
Chris@0:    'title'       => $rss->getTitle(),
Chris@0:    'link'        => $rss->getLink(),
Chris@0:    'description' => $purifier->purify($rss->getDescription()),
Chris@0:    'items'       => [],
Chris@0: ];
Chris@0: 
Chris@0: // Loop over each channel item and store relevant data
Chris@0: // See that we're cleaning the descriptions with HTMLPurifier
Chris@0: foreach ($rss as $item) {
Chris@0:    $channel['items'][] = [
Chris@0:        'title'       => $item->getTitle(),
Chris@0:        'link'        => $item->getLink(),
Chris@0:        'description' => $purifier->purify($item->getDescription()),
Chris@0:    ];
Chris@0: }
Chris@0: ```
Chris@0: 
Chris@0: > ### Tidy is required
Chris@0: >
Chris@0: > HTMLPurifier is using the PHP [Tidy extension](http://php.net/tidy) to clean
Chris@0: > and repair the final output. If this extension is not available, it will
Chris@0: > silently fail, but its availability has no impact on the library's security.
Chris@0: 
Chris@0: > ### Caching
Chris@0: >
Chris@0: > For the sake of this example, the HTMLPurifier's cache is disabled, but it is
Chris@0: > recommended to configure caching and use its standalone include file as it can
Chris@0: > improve the performance of HTMLPurifier substantially.
Chris@0: 
Chris@0: ## Escaping data using zend-escaper
Chris@0: 
Chris@0: To help prevent XSS attacks, Zend Framework provides the [zend-escaper component](https://github.com/zendframework/zend-escaper),
Chris@0: which complies to the current [OWASP recommendations](https://www.owasp.org/index.php/XSS_Prevention_Cheat_Sheet),
Chris@0: and as such, is the recommended tool for escaping HTML tags and attributes,
Chris@0: Javascript, CSS and URLs before outputing any potentially insecure data to the
Chris@0: users.
Chris@0: 
Chris@0: ```php
Chris@0: try {
Chris@0:     $rss = Zend\Feed\Reader\Reader::import('http://www.planet-php.net/rss/');
Chris@0: } catch (Zend\Feed\Exception\Reader\RuntimeException $e) {
Chris@0:     // feed import failed
Chris@0:     echo "Exception caught importing feed: {$e->getMessage()}\n";
Chris@0:     exit;
Chris@0: }
Chris@0: 
Chris@0: // Validate all URIs
Chris@0: $linkValidator = new Zend\Validator\Uri;
Chris@0: $link = null;
Chris@0: if ($linkValidator->isValid($rss->getLink())) {
Chris@0:     $link = $rss->getLink();
Chris@0: }
Chris@0: 
Chris@0: // Escaper used for escaping data
Chris@0: $escaper = new Zend\Escaper\Escaper('utf-8');
Chris@0: 
Chris@0: // Initialize the channel data array
Chris@0: $channel = [
Chris@0:     'title'       => $escaper->escapeHtml($rss->getTitle()),
Chris@0:     'link'        => $escaper->escapeUrl($link),
Chris@0:     'description' => $escaper->escapeHtml($rss->getDescription()),
Chris@0:     'items'       => [],
Chris@0: ];
Chris@0: 
Chris@0: // Loop over each channel item and store relevant data
Chris@0: foreach ($rss as $item) {
Chris@0:     $link = null;
Chris@0:     if ($linkValidator->isValid($rss->getLink())) {
Chris@0:         $link = $item->getLink();
Chris@0:     }
Chris@0:     $channel['items'][] = [
Chris@0:         'title'       => $escaper->escapeHtml($item->getTitle()),
Chris@0:         'link'        => $escaper->escapeUrl($link),
Chris@0:         'description' => $escaper->escapeHtml($item->getDescription()),
Chris@0:     ];
Chris@0: }
Chris@0: ```
Chris@0: 
Chris@0: The feed data is now safe for output to HTML templates. You can, of course, skip
Chris@0: escaping when simply storing the data persistently, but remember to escape it on
Chris@0: output later!
Chris@0: 
Chris@0: Of course, these are just basic examples, and cannot cover all possible
Chris@0: scenarios that you, as a developer, can, and most likely will, encounter. Your
Chris@0: responsibility is to learn what libraries and tools are at your disposal, and
Chris@0: when and how to use them to secure your web applications.