annotate vendor/nikic/php-parser/doc/2_Usage_of_basic_components.markdown @ 2:92f882872392

Trusted hosts, + remove migration modules
author Chris Cannam
date Tue, 05 Dec 2017 09:26:43 +0000
parents 4c8ae668cc8c
children 5fb285c0d0e3
rev   line source
Chris@0 1 Usage of basic components
Chris@0 2 =========================
Chris@0 3
Chris@0 4 This document explains how to use the parser, the pretty printer and the node traverser.
Chris@0 5
Chris@0 6 Bootstrapping
Chris@0 7 -------------
Chris@0 8
Chris@0 9 To bootstrap the library, include the autoloader generated by composer:
Chris@0 10
Chris@0 11 ```php
Chris@0 12 require 'path/to/vendor/autoload.php';
Chris@0 13 ```
Chris@0 14
Chris@0 15 Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
Chris@0 16
Chris@0 17 ```php
Chris@0 18 ini_set('xdebug.max_nesting_level', 3000);
Chris@0 19 ```
Chris@0 20
Chris@0 21 This ensures that there will be no errors when traversing highly nested node trees. However, it is
Chris@0 22 preferable to disable XDebug completely, as it can easily make this library more than five times
Chris@0 23 slower.
Chris@0 24
Chris@0 25 Parsing
Chris@0 26 -------
Chris@0 27
Chris@0 28 In order to parse code, you first have to create a parser instance:
Chris@0 29
Chris@0 30 ```php
Chris@0 31 use PhpParser\ParserFactory;
Chris@0 32 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
Chris@0 33 ```
Chris@0 34
Chris@0 35 The factory accepts a kind argument, that determines how different PHP versions are treated:
Chris@0 36
Chris@0 37 Kind | Behavior
Chris@0 38 -----|---------
Chris@0 39 `ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
Chris@0 40 `ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
Chris@0 41 `ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
Chris@0 42 `ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
Chris@0 43
Chris@0 44 Unless you have strong reason to use something else, `PREFER_PHP7` is a reasonable default.
Chris@0 45
Chris@0 46 The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
Chris@0 47 that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
Chris@0 48
Chris@0 49 Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
Chris@0 50 create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
Chris@0 51
Chris@0 52 ```php
Chris@0 53 use PhpParser\Error;
Chris@0 54 use PhpParser\ParserFactory;
Chris@0 55
Chris@0 56 $code = '<?php // some code';
Chris@0 57 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
Chris@0 58
Chris@0 59 try {
Chris@0 60 $stmts = $parser->parse($code);
Chris@0 61 // $stmts is an array of statement nodes
Chris@0 62 } catch (Error $e) {
Chris@0 63 echo 'Parse Error: ', $e->getMessage();
Chris@0 64 }
Chris@0 65 ```
Chris@0 66
Chris@0 67 A parser instance can be reused to parse multiple files.
Chris@0 68
Chris@0 69 Node tree
Chris@0 70 ---------
Chris@0 71
Chris@0 72 If you use the above code with `$code = "<?php echo 'Hi ', hi\\getTarget();"` the parser will
Chris@0 73 generate a node tree looking like this:
Chris@0 74
Chris@0 75 ```
Chris@0 76 array(
Chris@0 77 0: Stmt_Echo(
Chris@0 78 exprs: array(
Chris@0 79 0: Scalar_String(
Chris@0 80 value: Hi
Chris@0 81 )
Chris@0 82 1: Expr_FuncCall(
Chris@0 83 name: Name(
Chris@0 84 parts: array(
Chris@0 85 0: hi
Chris@0 86 1: getTarget
Chris@0 87 )
Chris@0 88 )
Chris@0 89 args: array(
Chris@0 90 )
Chris@0 91 )
Chris@0 92 )
Chris@0 93 )
Chris@0 94 )
Chris@0 95 ```
Chris@0 96
Chris@0 97 Thus `$stmts` will contain an array with only one node, with this node being an instance of
Chris@0 98 `PhpParser\Node\Stmt\Echo_`.
Chris@0 99
Chris@0 100 As PHP is a large language there are approximately 140 different nodes. In order to make work
Chris@0 101 with them easier they are grouped into three categories:
Chris@0 102
Chris@0 103 * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
Chris@0 104 a value and can not occur in an expression. For example a class definition is a statement.
Chris@0 105 It doesn't return a value and you can't write something like `func(class A {});`.
Chris@0 106 * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
Chris@0 107 and thus can occur in other expressions. Examples of expressions are `$var`
Chris@0 108 (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
Chris@0 109 * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
Chris@0 110 (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
Chris@0 111 like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
Chris@0 112 `PhpParser\Node\Expr`, as scalars are expressions, too.
Chris@0 113 * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
Chris@0 114 and call arguments (`PhpParser\Node\Arg`).
Chris@0 115
Chris@0 116 Some node class names have a trailing `_`. This is used whenever the class name would otherwise clash
Chris@0 117 with a PHP keyword.
Chris@0 118
Chris@0 119 Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
Chris@0 120 `$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
Chris@0 121 in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
Chris@0 122 call, you would write `$stmts[0]->exprs[1]->name`.
Chris@0 123
Chris@0 124 All nodes also define a `getType()` method that returns the node type. The type is the class name
Chris@0 125 without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
Chris@0 126 `_` for reserved-keyword class names.
Chris@0 127
Chris@0 128 It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
Chris@0 129 can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
Chris@0 130
Chris@0 131 By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
Chris@0 132 of `PhpParser\Comment[\Doc]` instances.
Chris@0 133
Chris@0 134 The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
Chris@0 135 The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
Chris@0 136
Chris@0 137 Pretty printer
Chris@0 138 --------------
Chris@0 139
Chris@0 140 The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
Chris@0 141 information the formatting is done using a specified scheme. Currently there is only one scheme available,
Chris@0 142 namely `PhpParser\PrettyPrinter\Standard`.
Chris@0 143
Chris@0 144 ```php
Chris@0 145 use PhpParser\Error;
Chris@0 146 use PhpParser\ParserFactory;
Chris@0 147 use PhpParser\PrettyPrinter;
Chris@0 148
Chris@0 149 $code = "<?php echo 'Hi ', hi\\getTarget();";
Chris@0 150
Chris@0 151 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
Chris@0 152 $prettyPrinter = new PrettyPrinter\Standard;
Chris@0 153
Chris@0 154 try {
Chris@0 155 // parse
Chris@0 156 $stmts = $parser->parse($code);
Chris@0 157
Chris@0 158 // change
Chris@0 159 $stmts[0] // the echo statement
Chris@0 160 ->exprs // sub expressions
Chris@0 161 [0] // the first of them (the string node)
Chris@0 162 ->value // it's value, i.e. 'Hi '
Chris@0 163 = 'Hello '; // change to 'Hello '
Chris@0 164
Chris@0 165 // pretty print
Chris@0 166 $code = $prettyPrinter->prettyPrint($stmts);
Chris@0 167
Chris@0 168 echo $code;
Chris@0 169 } catch (Error $e) {
Chris@0 170 echo 'Parse Error: ', $e->getMessage();
Chris@0 171 }
Chris@0 172 ```
Chris@0 173
Chris@0 174 The above code will output:
Chris@0 175
Chris@0 176 <?php echo 'Hello ', hi\getTarget();
Chris@0 177
Chris@0 178 As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
Chris@0 179 again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
Chris@0 180
Chris@0 181 The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
Chris@0 182 single expression using `prettyPrintExpr()`.
Chris@0 183
Chris@0 184 The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
Chris@0 185 and handle inline HTML as the first/last statement more gracefully.
Chris@0 186
Chris@0 187 Node traversation
Chris@0 188 -----------------
Chris@0 189
Chris@0 190 The above pretty printing example used the fact that the source code was known and thus it was easy to
Chris@0 191 write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
Chris@0 192 Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
Chris@0 193 going to look like.
Chris@0 194
Chris@0 195 For this purpose the parser provides a component for traversing and visiting the node tree. The basic
Chris@0 196 structure of a program using this `PhpParser\NodeTraverser` looks like this:
Chris@0 197
Chris@0 198 ```php
Chris@0 199 use PhpParser\NodeTraverser;
Chris@0 200 use PhpParser\ParserFactory;
Chris@0 201 use PhpParser\PrettyPrinter;
Chris@0 202
Chris@0 203 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
Chris@0 204 $traverser = new NodeTraverser;
Chris@0 205 $prettyPrinter = new PrettyPrinter\Standard;
Chris@0 206
Chris@0 207 // add your visitor
Chris@0 208 $traverser->addVisitor(new MyNodeVisitor);
Chris@0 209
Chris@0 210 try {
Chris@0 211 $code = file_get_contents($fileName);
Chris@0 212
Chris@0 213 // parse
Chris@0 214 $stmts = $parser->parse($code);
Chris@0 215
Chris@0 216 // traverse
Chris@0 217 $stmts = $traverser->traverse($stmts);
Chris@0 218
Chris@0 219 // pretty print
Chris@0 220 $code = $prettyPrinter->prettyPrintFile($stmts);
Chris@0 221
Chris@0 222 echo $code;
Chris@0 223 } catch (PhpParser\Error $e) {
Chris@0 224 echo 'Parse Error: ', $e->getMessage();
Chris@0 225 }
Chris@0 226 ```
Chris@0 227
Chris@0 228 The corresponding node visitor might look like this:
Chris@0 229
Chris@0 230 ```php
Chris@0 231 use PhpParser\Node;
Chris@0 232 use PhpParser\NodeVisitorAbstract;
Chris@0 233
Chris@0 234 class MyNodeVisitor extends NodeVisitorAbstract
Chris@0 235 {
Chris@0 236 public function leaveNode(Node $node) {
Chris@0 237 if ($node instanceof Node\Scalar\String_) {
Chris@0 238 $node->value = 'foo';
Chris@0 239 }
Chris@0 240 }
Chris@0 241 }
Chris@0 242 ```
Chris@0 243
Chris@0 244 The above node visitor would change all string literals in the program to `'foo'`.
Chris@0 245
Chris@0 246 All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
Chris@0 247 methods:
Chris@0 248
Chris@0 249 ```php
Chris@0 250 public function beforeTraverse(array $nodes);
Chris@0 251 public function enterNode(\PhpParser\Node $node);
Chris@0 252 public function leaveNode(\PhpParser\Node $node);
Chris@0 253 public function afterTraverse(array $nodes);
Chris@0 254 ```
Chris@0 255
Chris@0 256 The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
Chris@0 257 traverser was called with. This method can be used for resetting values before traversation or
Chris@0 258 preparing the tree for traversal.
Chris@0 259
Chris@0 260 The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
Chris@0 261 it is called once after the traversal.
Chris@0 262
Chris@0 263 The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
Chris@0 264 i.e. before its subnodes are traversed, the latter when it is left.
Chris@0 265
Chris@0 266 All four methods can either return the changed node or not return at all (i.e. `null`) in which
Chris@0 267 case the current node is not changed.
Chris@0 268
Chris@0 269 The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
Chris@0 270 which instructs the traverser to skip all children of the current node.
Chris@0 271
Chris@0 272 The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
Chris@0 273 case the current node will be removed from the parent array. Furthermore it is possible to return
Chris@0 274 an array of nodes, which will be merged into the parent array at the offset of the current node.
Chris@0 275 I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
Chris@0 276 be `array(A, X, Y, Z, C)`.
Chris@0 277
Chris@0 278 Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
Chris@0 279 class, which will define empty default implementations for all the above methods.
Chris@0 280
Chris@0 281 The NameResolver node visitor
Chris@0 282 -----------------------------
Chris@0 283
Chris@0 284 One visitor is already bundled with the package: `PhpParser\NodeVisitor\NameResolver`. This visitor
Chris@0 285 helps you work with namespaced code by trying to resolve most names to fully qualified ones.
Chris@0 286
Chris@0 287 For example, consider the following code:
Chris@0 288
Chris@0 289 use A as B;
Chris@0 290 new B\C();
Chris@0 291
Chris@0 292 In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
Chris@0 293 The `NameResolver` takes care of that and resolves names as far as possible.
Chris@0 294
Chris@0 295 After running it most names will be fully qualified. The only names that will stay unqualified are
Chris@0 296 unqualified function and constant names. These are resolved at runtime and thus the visitor can't
Chris@0 297 know which function they are referring to. In most cases this is a non-issue as the global functions
Chris@0 298 are meant.
Chris@0 299
Chris@0 300 Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
Chris@0 301 that contains the namespaced name instead of only the shortname that is available via `name`.
Chris@0 302
Chris@0 303 Example: Converting namespaced code to pseudo namespaces
Chris@0 304 --------------------------------------------------------
Chris@0 305
Chris@0 306 A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
Chris@0 307 so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
Chris@0 308 are fairly complicated if you take PHP's dynamic features into account, so our conversion will
Chris@0 309 assume that no dynamic features are used.
Chris@0 310
Chris@0 311 We start off with the following base code:
Chris@0 312
Chris@0 313 ```php
Chris@0 314 use PhpParser\ParserFactory;
Chris@0 315 use PhpParser\PrettyPrinter;
Chris@0 316 use PhpParser\NodeTraverser;
Chris@0 317 use PhpParser\NodeVisitor\NameResolver;
Chris@0 318
Chris@0 319 $inDir = '/some/path';
Chris@0 320 $outDir = '/some/other/path';
Chris@0 321
Chris@0 322 $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
Chris@0 323 $traverser = new NodeTraverser;
Chris@0 324 $prettyPrinter = new PrettyPrinter\Standard;
Chris@0 325
Chris@0 326 $traverser->addVisitor(new NameResolver); // we will need resolved names
Chris@0 327 $traverser->addVisitor(new NamespaceConverter); // our own node visitor
Chris@0 328
Chris@0 329 // iterate over all .php files in the directory
Chris@0 330 $files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
Chris@0 331 $files = new \RegexIterator($files, '/\.php$/');
Chris@0 332
Chris@0 333 foreach ($files as $file) {
Chris@0 334 try {
Chris@0 335 // read the file that should be converted
Chris@0 336 $code = file_get_contents($file);
Chris@0 337
Chris@0 338 // parse
Chris@0 339 $stmts = $parser->parse($code);
Chris@0 340
Chris@0 341 // traverse
Chris@0 342 $stmts = $traverser->traverse($stmts);
Chris@0 343
Chris@0 344 // pretty print
Chris@0 345 $code = $prettyPrinter->prettyPrintFile($stmts);
Chris@0 346
Chris@0 347 // write the converted file to the target directory
Chris@0 348 file_put_contents(
Chris@0 349 substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
Chris@0 350 $code
Chris@0 351 );
Chris@0 352 } catch (PhpParser\Error $e) {
Chris@0 353 echo 'Parse Error: ', $e->getMessage();
Chris@0 354 }
Chris@0 355 }
Chris@0 356 ```
Chris@0 357
Chris@0 358 Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
Chris@0 359 is convert `A\\B` style names to `A_B` style ones.
Chris@0 360
Chris@0 361 ```php
Chris@0 362 use PhpParser\Node;
Chris@0 363
Chris@0 364 class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
Chris@0 365 {
Chris@0 366 public function leaveNode(Node $node) {
Chris@0 367 if ($node instanceof Node\Name) {
Chris@0 368 return new Node\Name($node->toString('_'));
Chris@0 369 }
Chris@0 370 }
Chris@0 371 }
Chris@0 372 ```
Chris@0 373
Chris@0 374 The above code profits from the fact that the `NameResolver` already resolved all names as far as
Chris@0 375 possible, so we don't need to do that. We only need to create a string with the name parts separated
Chris@0 376 by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
Chris@0 377 create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
Chris@0 378 a new name from the string and return it. Returning a new node replaces the old node.
Chris@0 379
Chris@0 380 Another thing we need to do is change the class/function/const declarations. Currently they contain
Chris@0 381 only the shortname (i.e. the last part of the name), but they need to contain the complete name including
Chris@0 382 the namespace prefix:
Chris@0 383
Chris@0 384 ```php
Chris@0 385 use PhpParser\Node;
Chris@0 386 use PhpParser\Node\Stmt;
Chris@0 387
Chris@0 388 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
Chris@0 389 {
Chris@0 390 public function leaveNode(Node $node) {
Chris@0 391 if ($node instanceof Node\Name) {
Chris@0 392 return new Node\Name($node->toString('_'));
Chris@0 393 } elseif ($node instanceof Stmt\Class_
Chris@0 394 || $node instanceof Stmt\Interface_
Chris@0 395 || $node instanceof Stmt\Function_) {
Chris@0 396 $node->name = $node->namespacedName->toString('_');
Chris@0 397 } elseif ($node instanceof Stmt\Const_) {
Chris@0 398 foreach ($node->consts as $const) {
Chris@0 399 $const->name = $const->namespacedName->toString('_');
Chris@0 400 }
Chris@0 401 }
Chris@0 402 }
Chris@0 403 }
Chris@0 404 ```
Chris@0 405
Chris@0 406 There is not much more to it than converting the namespaced name to string with `_` as separator.
Chris@0 407
Chris@0 408 The last thing we need to do is remove the `namespace` and `use` statements:
Chris@0 409
Chris@0 410 ```php
Chris@0 411 use PhpParser\Node;
Chris@0 412 use PhpParser\Node\Stmt;
Chris@0 413
Chris@0 414 class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
Chris@0 415 {
Chris@0 416 public function leaveNode(Node $node) {
Chris@0 417 if ($node instanceof Node\Name) {
Chris@0 418 return new Node\Name($node->toString('_'));
Chris@0 419 } elseif ($node instanceof Stmt\Class_
Chris@0 420 || $node instanceof Stmt\Interface_
Chris@0 421 || $node instanceof Stmt\Function_) {
Chris@0 422 $node->name = $node->namespacedName->toString('_');
Chris@0 423 } elseif ($node instanceof Stmt\Const_) {
Chris@0 424 foreach ($node->consts as $const) {
Chris@0 425 $const->name = $const->namespacedName->toString('_');
Chris@0 426 }
Chris@0 427 } elseif ($node instanceof Stmt\Namespace_) {
Chris@0 428 // returning an array merges is into the parent array
Chris@0 429 return $node->stmts;
Chris@0 430 } elseif ($node instanceof Stmt\Use_) {
Chris@0 431 // returning false removed the node altogether
Chris@0 432 return false;
Chris@0 433 }
Chris@0 434 }
Chris@0 435 }
Chris@0 436 ```
Chris@0 437
Chris@0 438 That's all.