annotate vendor/masterminds/html5/src/HTML5/Parser/README.md @ 19:fa3358dc1485 tip

Add ndrum files
author Chris Cannam
date Wed, 28 Aug 2019 13:14:47 +0100
parents 4c8ae668cc8c
children
rev   line source
Chris@0 1 # The Parser Model
Chris@0 2
Chris@0 3 The parser model here follows the model in section
Chris@0 4 [8.2.1](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#parsing)
Chris@0 5 of the HTML5 specification, though we do not assume a networking layer.
Chris@0 6
Chris@0 7 [ InputStream ] // Generic support for reading input.
Chris@0 8 ||
Chris@0 9 [ Scanner ] // Breaks down the stream into characters.
Chris@0 10 ||
Chris@0 11 [ Tokenizer ] // Groups characters into syntactic
Chris@0 12 ||
Chris@0 13 [ Tree Builder ] // Organizes units into a tree of objects
Chris@0 14 ||
Chris@0 15 [ DOM Document ] // The final state of the parsed document.
Chris@0 16
Chris@0 17
Chris@0 18 ## InputStream
Chris@0 19
Chris@0 20 This is an interface with at least two concrete implementations:
Chris@0 21
Chris@0 22 - StringInputStream: Reads an HTML5 string.
Chris@0 23 - FileInputStream: Reads an HTML5 file.
Chris@0 24
Chris@0 25 ## Scanner
Chris@0 26
Chris@0 27 This is a mechanical piece of the parser.
Chris@0 28
Chris@0 29 ## Tokenizer
Chris@0 30
Chris@0 31 This follows section 8.4 of the HTML5 spec. It is (roughly) a recursive
Chris@0 32 descent parser. (Though there are plenty of optimizations that are less
Chris@0 33 than purely functional.
Chris@0 34
Chris@0 35 ## EventHandler and DOMTree
Chris@0 36
Chris@0 37 EventHandler is the interface for tree builders. Since not all
Chris@0 38 implementations will necessarily build trees, we've chosen a more
Chris@0 39 generic name.
Chris@0 40
Chris@0 41 The event handler emits tokens during tokenization.
Chris@0 42
Chris@0 43 The DOMTree is an event handler that builds a DOM tree. The output of
Chris@0 44 the DOMTree builder is a DOMDocument.
Chris@0 45
Chris@0 46 ## DOMDocument
Chris@0 47
Chris@0 48 PHP has a DOMDocument class built-in (technically, it's part of libxml.)
Chris@0 49 We use that, thus rendering the output of this process compatible with
Chris@0 50 SimpleXML, QueryPath, and many other XML/HTML processing tools.
Chris@0 51
Chris@0 52 For cases where the HTML5 is a fragment of a HTML5 document a
Chris@0 53 DOMDocumentFragment is returned instead. This is another built-in class.