annotate vendor/fabpot/goutte/README.rst @ 19:fa3358dc1485 tip

Add ndrum files
author Chris Cannam
date Wed, 28 Aug 2019 13:14:47 +0100
parents 4c8ae668cc8c
children
rev   line source
Chris@0 1 Goutte, a simple PHP Web Scraper
Chris@0 2 ================================
Chris@0 3
Chris@0 4 Goutte is a screen scraping and web crawling library for PHP.
Chris@0 5
Chris@0 6 Goutte provides a nice API to crawl websites and extract data from the HTML/XML
Chris@0 7 responses.
Chris@0 8
Chris@0 9 Requirements
Chris@0 10 ------------
Chris@0 11
Chris@0 12 Goutte depends on PHP 5.5+ and Guzzle 6+.
Chris@0 13
Chris@0 14 .. tip::
Chris@0 15
Chris@0 16 If you need support for PHP 5.4 or Guzzle 4-5, use Goutte 2.x (latest `phar
Chris@0 17 <https://github.com/FriendsOfPHP/Goutte/releases/download/v2.0.4/goutte-v2.0.4.phar>`_).
Chris@0 18
Chris@0 19 If you need support for PHP 5.3 or Guzzle 3, use Goutte 1.x (latest `phar
Chris@0 20 <https://github.com/FriendsOfPHP/Goutte/releases/download/v1.0.7/goutte-v1.0.7.phar>`_).
Chris@0 21
Chris@0 22 Installation
Chris@0 23 ------------
Chris@0 24
Chris@0 25 Add ``fabpot/goutte`` as a require dependency in your ``composer.json`` file:
Chris@0 26
Chris@0 27 .. code-block:: bash
Chris@0 28
Chris@0 29 composer require fabpot/goutte
Chris@0 30
Chris@0 31 Usage
Chris@0 32 -----
Chris@0 33
Chris@0 34 Create a Goutte Client instance (which extends
Chris@0 35 ``Symfony\Component\BrowserKit\Client``):
Chris@0 36
Chris@0 37 .. code-block:: php
Chris@0 38
Chris@0 39 use Goutte\Client;
Chris@0 40
Chris@0 41 $client = new Client();
Chris@0 42
Chris@0 43 Make requests with the ``request()`` method:
Chris@0 44
Chris@0 45 .. code-block:: php
Chris@0 46
Chris@0 47 // Go to the symfony.com website
Chris@0 48 $crawler = $client->request('GET', 'https://www.symfony.com/blog/');
Chris@0 49
Chris@0 50 The method returns a ``Crawler`` object
Chris@0 51 (``Symfony\Component\DomCrawler\Crawler``).
Chris@0 52
Chris@0 53 To use your own Guzzle settings, you may create and pass a new Guzzle 6
Chris@0 54 instance to Goutte. For example, to add a 60 second request timeout:
Chris@0 55
Chris@0 56 .. code-block:: php
Chris@0 57
Chris@0 58 use Goutte\Client;
Chris@0 59 use GuzzleHttp\Client as GuzzleClient;
Chris@0 60
Chris@0 61 $goutteClient = new Client();
Chris@0 62 $guzzleClient = new GuzzleClient(array(
Chris@0 63 'timeout' => 60,
Chris@0 64 ));
Chris@0 65 $goutteClient->setClient($guzzleClient);
Chris@0 66
Chris@0 67 Click on links:
Chris@0 68
Chris@0 69 .. code-block:: php
Chris@0 70
Chris@0 71 // Click on the "Security Advisories" link
Chris@0 72 $link = $crawler->selectLink('Security Advisories')->link();
Chris@0 73 $crawler = $client->click($link);
Chris@0 74
Chris@0 75 Extract data:
Chris@0 76
Chris@0 77 .. code-block:: php
Chris@0 78
Chris@0 79 // Get the latest post in this category and display the titles
Chris@0 80 $crawler->filter('h2 > a')->each(function ($node) {
Chris@0 81 print $node->text()."\n";
Chris@0 82 });
Chris@0 83
Chris@0 84 Submit forms:
Chris@0 85
Chris@0 86 .. code-block:: php
Chris@0 87
Chris@0 88 $crawler = $client->request('GET', 'https://github.com/');
Chris@0 89 $crawler = $client->click($crawler->selectLink('Sign in')->link());
Chris@0 90 $form = $crawler->selectButton('Sign in')->form();
Chris@0 91 $crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
Chris@0 92 $crawler->filter('.flash-error')->each(function ($node) {
Chris@0 93 print $node->text()."\n";
Chris@0 94 });
Chris@0 95
Chris@0 96 More Information
Chris@0 97 ----------------
Chris@0 98
Chris@0 99 Read the documentation of the `BrowserKit`_ and `DomCrawler`_ Symfony
Chris@0 100 Components for more information about what you can do with Goutte.
Chris@0 101
Chris@0 102 Pronunciation
Chris@0 103 -------------
Chris@0 104
Chris@0 105 Goutte is pronounced ``goot`` i.e. it rhymes with ``boot`` and not ``out``.
Chris@0 106
Chris@0 107 Technical Information
Chris@0 108 ---------------------
Chris@0 109
Chris@0 110 Goutte is a thin wrapper around the following fine PHP libraries:
Chris@0 111
Chris@0 112 * Symfony Components: `BrowserKit`_, `CssSelector`_ and `DomCrawler`_;
Chris@0 113
Chris@0 114 * `Guzzle`_ HTTP Component.
Chris@0 115
Chris@0 116 License
Chris@0 117 -------
Chris@0 118
Chris@0 119 Goutte is licensed under the MIT license.
Chris@0 120
Chris@0 121 .. _`Composer`: https://getcomposer.org
Chris@0 122 .. _`Guzzle`: http://docs.guzzlephp.org
Chris@0 123 .. _`BrowserKit`: https://symfony.com/components/BrowserKit
Chris@0 124 .. _`DomCrawler`: https://symfony.com/doc/current/components/dom_crawler.html
Chris@0 125 .. _`CssSelector`: https://symfony.com/doc/current/components/css_selector.html