Mercurial > hg > isophonics-drupal-site
comparison vendor/masterminds/html5/README.md @ 0:4c8ae668cc8c
Initial import (non-working)
author | Chris Cannam |
---|---|
date | Wed, 29 Nov 2017 16:09:58 +0000 |
parents | |
children | 129ea1e6d783 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4c8ae668cc8c |
---|---|
1 # HTML5-PHP | |
2 | |
3 HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. | |
4 It is stable and used in many production websites, and has | |
5 well over [one million downloads](https://packagist.org/packages/masterminds/html5). | |
6 | |
7 HTML5 provides the following features. | |
8 | |
9 - An HTML5 serializer | |
10 - Support for PHP namespaces | |
11 - Composer support | |
12 - Event-based (SAX-like) parser | |
13 - A DOM tree builder | |
14 - Interoperability with [QueryPath](https://github.com/technosophos/querypath) | |
15 - Runs on **PHP** 5.3.0 or newer and **HHVM** 3.2 or newer | |
16 | |
17 [](https://travis-ci.org/Masterminds/html5-php) | |
18 [](https://packagist.org/packages/masterminds/html5) | |
19 [](https://scrutinizer-ci.com/g/Masterminds/html5-php/?branch=master) | |
20 [](https://scrutinizer-ci.com/g/Masterminds/html5-php/?branch=master) | |
21 [](https://masterminds.github.io/stability/sustained.html) | |
22 | |
23 ## Installation | |
24 | |
25 Install HTML5-PHP using [composer](http://getcomposer.org/). | |
26 | |
27 To install, add `masterminds/html5` to your `composer.json` file: | |
28 | |
29 ```json | |
30 { | |
31 "require" : { | |
32 "masterminds/html5": "2.*" | |
33 }, | |
34 } | |
35 ``` | |
36 | |
37 (You may substitute `2.*` for a more specific release tag, of | |
38 course.) | |
39 | |
40 From there, use the `composer install` or `composer update` commands to | |
41 install. | |
42 | |
43 ## Basic Usage | |
44 | |
45 HTML5-PHP has a high-level API and a low-level API. | |
46 | |
47 Here is how you use the high-level `HTML5` library API: | |
48 | |
49 ```php | |
50 <?php | |
51 // Assuming you installed from Composer: | |
52 require "vendor/autoload.php"; | |
53 use Masterminds\HTML5; | |
54 | |
55 | |
56 // An example HTML document: | |
57 $html = <<< 'HERE' | |
58 <html> | |
59 <head> | |
60 <title>TEST</title> | |
61 </head> | |
62 <body id='foo'> | |
63 <h1>Hello World</h1> | |
64 <p>This is a test of the HTML5 parser.</p> | |
65 </body> | |
66 </html> | |
67 HERE; | |
68 | |
69 // Parse the document. $dom is a DOMDocument. | |
70 $html5 = new HTML5(); | |
71 $dom = $html5->loadHTML($html); | |
72 | |
73 // Render it as HTML5: | |
74 print $html5->saveHTML($dom); | |
75 | |
76 // Or save it to a file: | |
77 $html5->save($dom, 'out.html'); | |
78 | |
79 ?> | |
80 ``` | |
81 | |
82 The `$dom` created by the parser is a full `DOMDocument` object. And the | |
83 `save()` and `saveHTML()` methods will take any DOMDocument. | |
84 | |
85 ### Options | |
86 | |
87 It is possible to pass in an array of configuration options when loading | |
88 an HTML5 document. | |
89 | |
90 ```php | |
91 // An associative array of options | |
92 $options = array( | |
93 'option_name' => 'option_value', | |
94 ); | |
95 | |
96 // Provide the options to the constructor | |
97 $html5 = new HTML5($options); | |
98 | |
99 $dom = $html5->loadHTML($html); | |
100 ``` | |
101 | |
102 The following options are supported: | |
103 | |
104 * `encode_entities` (boolean): Indicates that the serializer should aggressively | |
105 encode characters as entities. Without this, it only encodes the bare | |
106 minimum. | |
107 * `disable_html_ns` (boolean): Prevents the parser from automatically | |
108 assigning the HTML5 namespace to the DOM document. This is for | |
109 non-namespace aware DOM tools. | |
110 * `target_document` (\DOMDocument): A DOM document that will be used as the | |
111 destination for the parsed nodes. | |
112 * `implicit_namespaces` (array): An assoc array of namespaces that should be | |
113 used by the parser. Name is tag prefix, value is NS URI. | |
114 | |
115 ## The Low-Level API | |
116 | |
117 This library provides the following low-level APIs that you can use to | |
118 create more customized HTML5 tools: | |
119 | |
120 - An `InputStream` abstraction that can work with different kinds of | |
121 input source (not just files and strings). | |
122 - A SAX-like event-based parser that you can hook into for special kinds | |
123 of parsing. | |
124 - A flexible error-reporting mechanism that can be tuned to document | |
125 syntax checking. | |
126 - A DOM implementation that uses PHP's built-in DOM library. | |
127 | |
128 The unit tests exercise each piece of the API, and every public function | |
129 is well-documented. | |
130 | |
131 ### Parser Design | |
132 | |
133 The parser is designed as follows: | |
134 | |
135 - The `InputStream` portion handles direct I/O. | |
136 - The `Scanner` handles scanning on behalf of the parser. | |
137 - The `Tokenizer` requests data off of the scanner, parses it, clasifies | |
138 it, and sends it to an `EventHandler`. It is a *recursive descent parser.* | |
139 - The `EventHandler` receives notifications and data for each specific | |
140 semantic event that occurs during tokenization. | |
141 - The `DOMBuilder` is an `EventHandler` that listens for tokenizing | |
142 events and builds a document tree (`DOMDocument`) based on the events. | |
143 | |
144 ### Serializer Design | |
145 | |
146 The serializer takes a data structure (the `DOMDocument`) and transforms | |
147 it into a character representation -- an HTML5 document. | |
148 | |
149 The serializer is broken into three parts: | |
150 | |
151 - The `OutputRules` contain the rules to turn DOM elements into strings. The | |
152 rules are an implementation of the interface `RulesInterface` allowing for | |
153 different rule sets to be used. | |
154 - The `Traverser`, which is a special-purpose tree walker. It visits | |
155 each node node in the tree and uses the `OutputRules` to transform the node | |
156 into a string. | |
157 - `HTML5` manages the `Traverser` and stores the resultant data | |
158 in the correct place. | |
159 | |
160 The serializer (`save()`, `saveHTML()`) follows the | |
161 [section 8.9 of the HTML 5.0 spec](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#serializing-html-fragments). | |
162 So tags are serialized according to these rules: | |
163 | |
164 - A tag with children: <foo>CHILDREN</foo> | |
165 - A tag that cannot have content: <foo> (no closing tag) | |
166 - A tag that could have content, but doesn't: <foo></foo> | |
167 | |
168 ## Known Issues (Or, Things We Designed Against the Spec) | |
169 | |
170 Please check the issue queue for a full list, but the following are | |
171 issues known issues that are not presently on the roadmap: | |
172 | |
173 - Namespaces: HTML5 only [supports a selected list of namespaces](http://www.w3.org/TR/html5/infrastructure.html#namespaces) | |
174 and they do not operate in the same way as XML namespaces. A `:` has no special | |
175 meaning. | |
176 By default the parser does not support XML style namespaces via `:`; | |
177 to enable the XML namespaces see the [XML Namespaces section](#xml-namespaces) | |
178 - Scripts: This parser does not contain a JavaScript or a CSS | |
179 interpreter. While one may be supplied, not all features will be | |
180 supported. | |
181 - Rentrance: The current parser is not re-entrant. (Thus you can't pause | |
182 the parser to modify the HTML string mid-parse.) | |
183 - Validation: The current tree builder is **not** a validating parser. | |
184 While it will correct some HTML, it does not check that the HTML | |
185 conforms to the standard. (Should you wish, you can build a validating | |
186 parser by extending DOMTree or building your own EventHandler | |
187 implementation.) | |
188 * There is limited support for insertion modes. | |
189 * Some autocorrection is done automatically. | |
190 * Per the spec, many legacy tags are admitted and correctly handled, | |
191 even though they are technically not part of HTML5. | |
192 - Attribute names and values: Due to the implementation details of the | |
193 PHP implementation of DOM, attribute names that do not follow the | |
194 XML 1.0 standard are not inserted into the DOM. (Effectively, they | |
195 are ignored.) If you've got a clever fix for this, jump in! | |
196 - Processor Instructions: The HTML5 spec does not allow processor | |
197 instructions. We do. Since this is a server-side library, we think | |
198 this is useful. And that means, dear reader, that in some cases you | |
199 can parse the HTML from a mixed PHP/HTML document. This, however, | |
200 is an incidental feature, not a core feature. | |
201 - HTML manifests: Unsupported. | |
202 - PLAINTEXT: Unsupported. | |
203 - Adoption Agency Algorithm: Not yet implemented. (8.2.5.4.7) | |
204 | |
205 ##XML Namespaces | |
206 | |
207 To use XML style namespaces you have to configure well the main `HTML5` instance. | |
208 | |
209 ```php | |
210 use Masterminds\HTML5; | |
211 $html = new HTML5(array( | |
212 "xmlNamespaces" => true | |
213 )); | |
214 | |
215 $dom = $html->loadHTML('<t:tag xmlns:t="http://www.example.com"/>'); | |
216 | |
217 $dom->documentElement->namespaceURI; // http://www.example.com | |
218 | |
219 ``` | |
220 | |
221 You can also add some default prefixes that will not require the namespace declaration, | |
222 but it's elements will be namespaced. | |
223 | |
224 ```php | |
225 use Masterminds\HTML5; | |
226 $html = new HTML5(array( | |
227 "implicitNamespaces"=>array( | |
228 "t"=>"http://www.example.com" | |
229 ) | |
230 )); | |
231 | |
232 $dom = $html->loadHTML('<t:tag/>'); | |
233 | |
234 $dom->documentElement->namespaceURI; // http://www.example.com | |
235 | |
236 ``` | |
237 | |
238 ## Thanks to... | |
239 | |
240 The dedicated (and patient) contributors of patches small and large, | |
241 who have already made this library better.See the CREDITS file for | |
242 a list of contributors. | |
243 | |
244 We owe a huge debt of gratitude to the original authors of html5lib. | |
245 | |
246 While not much of the orignal parser remains, we learned a lot from | |
247 reading the html5lib library. And some pieces remain here. In | |
248 particular, much of the UTF-8 and Unicode handling is derived from the | |
249 html5lib project. | |
250 | |
251 ## License | |
252 | |
253 This software is released under the MIT license. The original html5lib | |
254 library was also released under the MIT license. | |
255 | |
256 See LICENSE.txt | |
257 | |
258 Certain files contain copyright assertions by specific individuals | |
259 involved with html5lib. Those have been retained where appropriate. |