Mercurial > hg > isophonics-drupal-site
comparison vendor/masterminds/html5/src/HTML5/Parser/README.md @ 0:4c8ae668cc8c
Initial import (non-working)
author | Chris Cannam |
---|---|
date | Wed, 29 Nov 2017 16:09:58 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4c8ae668cc8c |
---|---|
1 # The Parser Model | |
2 | |
3 The parser model here follows the model in section | |
4 [8.2.1](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#parsing) | |
5 of the HTML5 specification, though we do not assume a networking layer. | |
6 | |
7 [ InputStream ] // Generic support for reading input. | |
8 || | |
9 [ Scanner ] // Breaks down the stream into characters. | |
10 || | |
11 [ Tokenizer ] // Groups characters into syntactic | |
12 || | |
13 [ Tree Builder ] // Organizes units into a tree of objects | |
14 || | |
15 [ DOM Document ] // The final state of the parsed document. | |
16 | |
17 | |
18 ## InputStream | |
19 | |
20 This is an interface with at least two concrete implementations: | |
21 | |
22 - StringInputStream: Reads an HTML5 string. | |
23 - FileInputStream: Reads an HTML5 file. | |
24 | |
25 ## Scanner | |
26 | |
27 This is a mechanical piece of the parser. | |
28 | |
29 ## Tokenizer | |
30 | |
31 This follows section 8.4 of the HTML5 spec. It is (roughly) a recursive | |
32 descent parser. (Though there are plenty of optimizations that are less | |
33 than purely functional. | |
34 | |
35 ## EventHandler and DOMTree | |
36 | |
37 EventHandler is the interface for tree builders. Since not all | |
38 implementations will necessarily build trees, we've chosen a more | |
39 generic name. | |
40 | |
41 The event handler emits tokens during tokenization. | |
42 | |
43 The DOMTree is an event handler that builds a DOM tree. The output of | |
44 the DOMTree builder is a DOMDocument. | |
45 | |
46 ## DOMDocument | |
47 | |
48 PHP has a DOMDocument class built-in (technically, it's part of libxml.) | |
49 We use that, thus rendering the output of this process compatible with | |
50 SimpleXML, QueryPath, and many other XML/HTML processing tools. | |
51 | |
52 For cases where the HTML5 is a fragment of a HTML5 document a | |
53 DOMDocumentFragment is returned instead. This is another built-in class. |