annotate vendor/nikic/php-parser/doc/0_Introduction.markdown @ 19:fa3358dc1485 tip

Add ndrum files
author Chris Cannam
date Wed, 28 Aug 2019 13:14:47 +0100
parents 129ea1e6d783
children
rev   line source
Chris@0 1 Introduction
Chris@0 2 ============
Chris@0 3
Chris@17 4 This project is a PHP 5.2 to PHP 7.3 parser **written in PHP itself**.
Chris@0 5
Chris@0 6 What is this for?
Chris@0 7 -----------------
Chris@0 8
Chris@0 9 A parser is useful for [static analysis][0], manipulation of code and basically any other
Chris@0 10 application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
Chris@0 11 (AST) of the code and thus allows dealing with it in an abstract and robust way.
Chris@0 12
Chris@0 13 There are other ways of processing source code. One that PHP supports natively is using the
Chris@0 14 token stream generated by [`token_get_all`][2]. The token stream is much more low level than
Chris@0 15 the AST and thus has different applications: It allows to also analyze the exact formatting of
Chris@0 16 a file. On the other hand the token stream is much harder to deal with for more complex analysis.
Chris@13 17 For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also
Chris@0 18 as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
Chris@0 19 all the different syntaxes from a stream of tokens.
Chris@0 20
Chris@0 21 Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
Chris@0 22 a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
Chris@0 23 would be in other, faster languages like C. Furthermore the people most probably wanting to do
Chris@0 24 programmatic PHP code analysis are incidentally PHP developers, not C developers.
Chris@0 25
Chris@0 26 What can it parse?
Chris@0 27 ------------------
Chris@0 28
Chris@17 29 The parser supports parsing PHP 5.2-7.3.
Chris@0 30
Chris@0 31 As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
Chris@0 32 version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
Chris@17 33 This allows to parse PHP 7.3 source code running on PHP 7.0, for example. This emulation is somewhat
Chris@0 34 hacky and not perfect, but it should work well on any sane code.
Chris@0 35
Chris@0 36 What output does it produce?
Chris@0 37 ----------------------------
Chris@0 38
Chris@13 39 The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks
Chris@0 40 can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
Chris@0 41 roughly looking like this:
Chris@0 42
Chris@0 43 ```
Chris@0 44 array(
Chris@0 45 0: Stmt_Echo(
Chris@0 46 exprs: array(
Chris@0 47 0: Scalar_String(
Chris@0 48 value: Hi
Chris@0 49 )
Chris@0 50 1: Scalar_String(
Chris@0 51 value: World
Chris@0 52 )
Chris@0 53 )
Chris@0 54 )
Chris@0 55 )
Chris@0 56 ```
Chris@0 57
Chris@0 58 This matches the structure of the code: An echo statement, which takes two strings as expressions,
Chris@0 59 with the values `Hi` and `World!`.
Chris@0 60
Chris@0 61 You can also see that the AST does not contain any whitespace information (but most comments are saved).
Chris@0 62 So using it for formatting analysis is not possible.
Chris@0 63
Chris@0 64 What else can it do?
Chris@0 65 --------------------
Chris@0 66
Chris@0 67 Apart from the parser itself this package also bundles support for some other, related features:
Chris@0 68
Chris@0 69 * Support for pretty printing, which is the act of converting an AST into PHP code. Please note
Chris@0 70 that "pretty printing" does not imply that the output is especially pretty. It's just how it's
Chris@0 71 called ;)
Chris@13 72 * Support for serializing and unserializing the node tree to JSON
Chris@0 73 * Support for dumping the node tree in a human readable form (see the section above for an
Chris@0 74 example of how the output looks like)
Chris@0 75 * Infrastructure for traversing and changing the AST (node traverser and node visitors)
Chris@0 76 * A node visitor for resolving namespaced names
Chris@0 77
Chris@0 78 [0]: http://en.wikipedia.org/wiki/Static_program_analysis
Chris@0 79 [1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
Chris@0 80 [2]: http://php.net/token_get_all