Chris@0
|
1 Introduction
|
Chris@0
|
2 ============
|
Chris@0
|
3
|
Chris@17
|
4 This project is a PHP 5.2 to PHP 7.3 parser **written in PHP itself**.
|
Chris@0
|
5
|
Chris@0
|
6 What is this for?
|
Chris@0
|
7 -----------------
|
Chris@0
|
8
|
Chris@0
|
9 A parser is useful for [static analysis][0], manipulation of code and basically any other
|
Chris@0
|
10 application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
|
Chris@0
|
11 (AST) of the code and thus allows dealing with it in an abstract and robust way.
|
Chris@0
|
12
|
Chris@0
|
13 There are other ways of processing source code. One that PHP supports natively is using the
|
Chris@0
|
14 token stream generated by [`token_get_all`][2]. The token stream is much more low level than
|
Chris@0
|
15 the AST and thus has different applications: It allows to also analyze the exact formatting of
|
Chris@0
|
16 a file. On the other hand the token stream is much harder to deal with for more complex analysis.
|
Chris@13
|
17 For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also
|
Chris@0
|
18 as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
|
Chris@0
|
19 all the different syntaxes from a stream of tokens.
|
Chris@0
|
20
|
Chris@0
|
21 Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
|
Chris@0
|
22 a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
|
Chris@0
|
23 would be in other, faster languages like C. Furthermore the people most probably wanting to do
|
Chris@0
|
24 programmatic PHP code analysis are incidentally PHP developers, not C developers.
|
Chris@0
|
25
|
Chris@0
|
26 What can it parse?
|
Chris@0
|
27 ------------------
|
Chris@0
|
28
|
Chris@17
|
29 The parser supports parsing PHP 5.2-7.3.
|
Chris@0
|
30
|
Chris@0
|
31 As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
|
Chris@0
|
32 version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
|
Chris@17
|
33 This allows to parse PHP 7.3 source code running on PHP 7.0, for example. This emulation is somewhat
|
Chris@0
|
34 hacky and not perfect, but it should work well on any sane code.
|
Chris@0
|
35
|
Chris@0
|
36 What output does it produce?
|
Chris@0
|
37 ----------------------------
|
Chris@0
|
38
|
Chris@13
|
39 The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks
|
Chris@0
|
40 can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
|
Chris@0
|
41 roughly looking like this:
|
Chris@0
|
42
|
Chris@0
|
43 ```
|
Chris@0
|
44 array(
|
Chris@0
|
45 0: Stmt_Echo(
|
Chris@0
|
46 exprs: array(
|
Chris@0
|
47 0: Scalar_String(
|
Chris@0
|
48 value: Hi
|
Chris@0
|
49 )
|
Chris@0
|
50 1: Scalar_String(
|
Chris@0
|
51 value: World
|
Chris@0
|
52 )
|
Chris@0
|
53 )
|
Chris@0
|
54 )
|
Chris@0
|
55 )
|
Chris@0
|
56 ```
|
Chris@0
|
57
|
Chris@0
|
58 This matches the structure of the code: An echo statement, which takes two strings as expressions,
|
Chris@0
|
59 with the values `Hi` and `World!`.
|
Chris@0
|
60
|
Chris@0
|
61 You can also see that the AST does not contain any whitespace information (but most comments are saved).
|
Chris@0
|
62 So using it for formatting analysis is not possible.
|
Chris@0
|
63
|
Chris@0
|
64 What else can it do?
|
Chris@0
|
65 --------------------
|
Chris@0
|
66
|
Chris@0
|
67 Apart from the parser itself this package also bundles support for some other, related features:
|
Chris@0
|
68
|
Chris@0
|
69 * Support for pretty printing, which is the act of converting an AST into PHP code. Please note
|
Chris@0
|
70 that "pretty printing" does not imply that the output is especially pretty. It's just how it's
|
Chris@0
|
71 called ;)
|
Chris@13
|
72 * Support for serializing and unserializing the node tree to JSON
|
Chris@0
|
73 * Support for dumping the node tree in a human readable form (see the section above for an
|
Chris@0
|
74 example of how the output looks like)
|
Chris@0
|
75 * Infrastructure for traversing and changing the AST (node traverser and node visitors)
|
Chris@0
|
76 * A node visitor for resolving namespaced names
|
Chris@0
|
77
|
Chris@0
|
78 [0]: http://en.wikipedia.org/wiki/Static_program_analysis
|
Chris@0
|
79 [1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
|
Chris@0
|
80 [2]: http://php.net/token_get_all
|