Chris@0: Usage of basic components Chris@0: ========================= Chris@0: Chris@0: This document explains how to use the parser, the pretty printer and the node traverser. Chris@0: Chris@0: Bootstrapping Chris@0: ------------- Chris@0: Chris@0: To bootstrap the library, include the autoloader generated by composer: Chris@0: Chris@0: ```php Chris@0: require 'path/to/vendor/autoload.php'; Chris@0: ``` Chris@0: Chris@0: Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value: Chris@0: Chris@0: ```php Chris@0: ini_set('xdebug.max_nesting_level', 3000); Chris@0: ``` Chris@0: Chris@0: This ensures that there will be no errors when traversing highly nested node trees. However, it is Chris@0: preferable to disable XDebug completely, as it can easily make this library more than five times Chris@0: slower. Chris@0: Chris@0: Parsing Chris@0: ------- Chris@0: Chris@0: In order to parse code, you first have to create a parser instance: Chris@0: Chris@0: ```php Chris@0: use PhpParser\ParserFactory; Chris@0: $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7); Chris@0: ``` Chris@0: Chris@0: The factory accepts a kind argument, that determines how different PHP versions are treated: Chris@0: Chris@0: Kind | Behavior Chris@0: -----|--------- Chris@0: `ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5. Chris@0: `ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7. Chris@0: `ParserFactory::ONLY_PHP7` | Parse code as PHP 7. Chris@0: `ParserFactory::ONLY_PHP5` | Parse code as PHP 5. Chris@0: Chris@13: Unless you have a strong reason to use something else, `PREFER_PHP7` is a reasonable default. Chris@0: Chris@0: The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases Chris@0: that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown). Chris@0: Chris@0: Subsequently you can pass PHP code (including the opening `create(ParserFactory::PREFER_PHP7); Chris@0: Chris@0: try { Chris@0: $stmts = $parser->parse($code); Chris@0: // $stmts is an array of statement nodes Chris@0: } catch (Error $e) { Chris@0: echo 'Parse Error: ', $e->getMessage(); Chris@0: } Chris@0: ``` Chris@0: Chris@0: A parser instance can be reused to parse multiple files. Chris@0: Chris@13: Node dumping Chris@13: ------------ Chris@0: Chris@13: To dump the abstact syntax tree in human readable form, a `NodeDumper` can be used: Chris@13: Chris@13: ```php Chris@13: dump($stmts), "\n"; Chris@13: ``` Chris@13: Chris@13: For the sample code from the previous section, this will produce the following output: Chris@0: Chris@0: ``` Chris@0: array( Chris@13: 0: Stmt_Function( Chris@13: byRef: false Chris@13: name: Identifier( Chris@13: name: printLine Chris@13: ) Chris@13: params: array( Chris@13: 0: Param( Chris@13: type: null Chris@13: byRef: false Chris@13: variadic: false Chris@13: var: Expr_Variable( Chris@13: name: msg Chris@13: ) Chris@13: default: null Chris@0: ) Chris@13: ) Chris@13: returnType: null Chris@13: stmts: array( Chris@13: 0: Stmt_Echo( Chris@13: exprs: array( Chris@13: 0: Expr_Variable( Chris@13: name: msg Chris@13: ) Chris@13: 1: Scalar_String( Chris@13: value: Chris@13: Chris@0: ) Chris@0: ) Chris@13: ) Chris@13: ) Chris@13: ) Chris@13: 1: Stmt_Expression( Chris@13: expr: Expr_FuncCall( Chris@13: name: Name( Chris@13: parts: array( Chris@13: 0: printLine Chris@13: ) Chris@13: ) Chris@13: args: array( Chris@13: 0: Arg( Chris@13: value: Scalar_String( Chris@13: value: Hello World!!! Chris@13: ) Chris@13: byRef: false Chris@13: unpack: false Chris@0: ) Chris@0: ) Chris@0: ) Chris@0: ) Chris@0: ) Chris@0: ``` Chris@0: Chris@13: You can also use the `php-parse` script to obtain such a node dump by calling it either with a file Chris@13: name or code string: Chris@0: Chris@13: ```sh Chris@13: vendor/bin/php-parse file.php Chris@13: vendor/bin/php-parse " PhpParser\Node\Stmt\Function_` Chris@13: * `Stmt_Expression -> PhpParser\Node\Stmt\Expression` Chris@13: Chris@13: The additional `_` at the end of the first class name is necessary, because `Function` is a Chris@13: reserved keyword. Many node class names in this library have a trailing `_` to avoid clashing with Chris@13: a keyword. Chris@13: Chris@13: As PHP is a large language there are approximately 140 different nodes. In order to make working Chris@0: with them easier they are grouped into three categories: Chris@0: Chris@0: * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return Chris@0: a value and can not occur in an expression. For example a class definition is a statement. Chris@0: It doesn't return a value and you can't write something like `func(class A {});`. Chris@0: * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value Chris@0: and thus can occur in other expressions. Examples of expressions are `$var` Chris@0: (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`). Chris@0: * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'` Chris@0: (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants Chris@0: like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend Chris@0: `PhpParser\Node\Expr`, as scalars are expressions, too. Chris@0: * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`) Chris@0: and call arguments (`PhpParser\Node\Arg`). Chris@0: Chris@13: The `Node\Stmt\Expression` node is somewhat confusing in that it contains both the terms "statement" Chris@13: and "expression". This node distinguishes `expr`, which is a `Node\Expr`, from `expr;`, which is Chris@13: an "expression statement" represented by `Node\Stmt\Expression` and containing `expr` as a sub-node. Chris@0: Chris@0: Every node has a (possibly zero) number of subnodes. You can access subnodes by writing Chris@0: `$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it Chris@0: in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function Chris@0: call, you would write `$stmts[0]->exprs[1]->name`. Chris@0: Chris@0: All nodes also define a `getType()` method that returns the node type. The type is the class name Chris@0: without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing Chris@0: `_` for reserved-keyword class names. Chris@0: Chris@0: It is possible to associate custom metadata with a node using the `setAttribute()` method. This data Chris@0: can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`. Chris@0: Chris@0: By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array Chris@0: of `PhpParser\Comment[\Doc]` instances. Chris@0: Chris@0: The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`). Chris@0: The last doc comment from the `comments` attribute can be obtained using `getDocComment()`. Chris@0: Chris@0: Pretty printer Chris@0: -------------- Chris@0: Chris@0: The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting Chris@0: information the formatting is done using a specified scheme. Currently there is only one scheme available, Chris@0: namely `PhpParser\PrettyPrinter\Standard`. Chris@0: Chris@0: ```php Chris@0: use PhpParser\Error; Chris@0: use PhpParser\ParserFactory; Chris@0: use PhpParser\PrettyPrinter; Chris@0: Chris@0: $code = "create(ParserFactory::PREFER_PHP7); Chris@0: $prettyPrinter = new PrettyPrinter\Standard; Chris@0: Chris@0: try { Chris@0: // parse Chris@0: $stmts = $parser->parse($code); Chris@0: Chris@0: // change Chris@0: $stmts[0] // the echo statement Chris@0: ->exprs // sub expressions Chris@0: [0] // the first of them (the string node) Chris@0: ->value // it's value, i.e. 'Hi ' Chris@0: = 'Hello '; // change to 'Hello ' Chris@0: Chris@0: // pretty print Chris@0: $code = $prettyPrinter->prettyPrint($stmts); Chris@0: Chris@0: echo $code; Chris@0: } catch (Error $e) { Chris@0: echo 'Parse Error: ', $e->getMessage(); Chris@0: } Chris@0: ``` Chris@0: Chris@0: The above code will output: Chris@0: Chris@13: echo 'Hello ', hi\getTarget(); Chris@0: Chris@0: As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then Chris@0: again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`. Chris@0: Chris@0: The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a Chris@0: single expression using `prettyPrintExpr()`. Chris@0: Chris@0: The `prettyPrintFile()` method can be used to print an entire file. This will include the opening ` Read more: [Pretty printing documentation](component/Pretty_printing.markdown) Chris@13: Chris@0: Node traversation Chris@0: ----------------- Chris@0: Chris@0: The above pretty printing example used the fact that the source code was known and thus it was easy to Chris@0: write code that accesses a certain part of a node tree and changes it. Normally this is not the case. Chris@0: Usually you want to change / analyze code in a generic way, where you don't know how the node tree is Chris@0: going to look like. Chris@0: Chris@0: For this purpose the parser provides a component for traversing and visiting the node tree. The basic Chris@0: structure of a program using this `PhpParser\NodeTraverser` looks like this: Chris@0: Chris@0: ```php Chris@0: use PhpParser\NodeTraverser; Chris@0: use PhpParser\ParserFactory; Chris@0: use PhpParser\PrettyPrinter; Chris@0: Chris@0: $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7); Chris@0: $traverser = new NodeTraverser; Chris@0: $prettyPrinter = new PrettyPrinter\Standard; Chris@0: Chris@0: // add your visitor Chris@0: $traverser->addVisitor(new MyNodeVisitor); Chris@0: Chris@0: try { Chris@0: $code = file_get_contents($fileName); Chris@0: Chris@0: // parse Chris@0: $stmts = $parser->parse($code); Chris@0: Chris@0: // traverse Chris@0: $stmts = $traverser->traverse($stmts); Chris@0: Chris@0: // pretty print Chris@0: $code = $prettyPrinter->prettyPrintFile($stmts); Chris@0: Chris@0: echo $code; Chris@0: } catch (PhpParser\Error $e) { Chris@0: echo 'Parse Error: ', $e->getMessage(); Chris@0: } Chris@0: ``` Chris@0: Chris@0: The corresponding node visitor might look like this: Chris@0: Chris@0: ```php Chris@0: use PhpParser\Node; Chris@0: use PhpParser\NodeVisitorAbstract; Chris@0: Chris@0: class MyNodeVisitor extends NodeVisitorAbstract Chris@0: { Chris@0: public function leaveNode(Node $node) { Chris@0: if ($node instanceof Node\Scalar\String_) { Chris@0: $node->value = 'foo'; Chris@0: } Chris@0: } Chris@0: } Chris@0: ``` Chris@0: Chris@0: The above node visitor would change all string literals in the program to `'foo'`. Chris@0: Chris@0: All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four Chris@0: methods: Chris@0: Chris@0: ```php Chris@0: public function beforeTraverse(array $nodes); Chris@0: public function enterNode(\PhpParser\Node $node); Chris@0: public function leaveNode(\PhpParser\Node $node); Chris@0: public function afterTraverse(array $nodes); Chris@0: ``` Chris@0: Chris@0: The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the Chris@0: traverser was called with. This method can be used for resetting values before traversation or Chris@0: preparing the tree for traversal. Chris@0: Chris@0: The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that Chris@0: it is called once after the traversal. Chris@0: Chris@0: The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered, Chris@0: i.e. before its subnodes are traversed, the latter when it is left. Chris@0: Chris@0: All four methods can either return the changed node or not return at all (i.e. `null`) in which Chris@0: case the current node is not changed. Chris@0: Chris@0: The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`, Chris@17: which instructs the traverser to skip all children of the current node. To furthermore prevent subsequent Chris@17: visitors from visiting the current node, `NodeTraverser::DONT_TRAVERSE_CURRENT_AND_CHILDREN` can be used instead. Chris@0: Chris@0: The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which Chris@0: case the current node will be removed from the parent array. Furthermore it is possible to return Chris@0: an array of nodes, which will be merged into the parent array at the offset of the current node. Chris@0: I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will Chris@0: be `array(A, X, Y, Z, C)`. Chris@0: Chris@0: Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract` Chris@0: class, which will define empty default implementations for all the above methods. Chris@0: Chris@13: > Read more: [Walking the AST](component/Walking_the_AST.markdown) Chris@13: Chris@0: The NameResolver node visitor Chris@0: ----------------------------- Chris@0: Chris@13: One visitor that is already bundled with the package is `PhpParser\NodeVisitor\NameResolver`. This visitor Chris@0: helps you work with namespaced code by trying to resolve most names to fully qualified ones. Chris@0: Chris@0: For example, consider the following code: Chris@0: Chris@0: use A as B; Chris@0: new B\C(); Chris@0: Chris@0: In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself. Chris@0: The `NameResolver` takes care of that and resolves names as far as possible. Chris@0: Chris@13: After running it, most names will be fully qualified. The only names that will stay unqualified are Chris@0: unqualified function and constant names. These are resolved at runtime and thus the visitor can't Chris@0: know which function they are referring to. In most cases this is a non-issue as the global functions Chris@0: are meant. Chris@0: Chris@0: Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations Chris@0: that contains the namespaced name instead of only the shortname that is available via `name`. Chris@0: Chris@13: > Read more: [Name resolution documentation](component/Name_resolution.markdown) Chris@13: Chris@0: Example: Converting namespaced code to pseudo namespaces Chris@0: -------------------------------------------------------- Chris@0: Chris@0: A small example to understand the concept: We want to convert namespaced code to pseudo namespaces Chris@0: so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions Chris@0: are fairly complicated if you take PHP's dynamic features into account, so our conversion will Chris@0: assume that no dynamic features are used. Chris@0: Chris@0: We start off with the following base code: Chris@0: Chris@0: ```php Chris@0: use PhpParser\ParserFactory; Chris@0: use PhpParser\PrettyPrinter; Chris@0: use PhpParser\NodeTraverser; Chris@0: use PhpParser\NodeVisitor\NameResolver; Chris@0: Chris@0: $inDir = '/some/path'; Chris@0: $outDir = '/some/other/path'; Chris@0: Chris@0: $parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7); Chris@0: $traverser = new NodeTraverser; Chris@0: $prettyPrinter = new PrettyPrinter\Standard; Chris@0: Chris@0: $traverser->addVisitor(new NameResolver); // we will need resolved names Chris@0: $traverser->addVisitor(new NamespaceConverter); // our own node visitor Chris@0: Chris@0: // iterate over all .php files in the directory Chris@0: $files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir)); Chris@0: $files = new \RegexIterator($files, '/\.php$/'); Chris@0: Chris@0: foreach ($files as $file) { Chris@0: try { Chris@0: // read the file that should be converted Chris@16: $code = file_get_contents($file->getPathName()); Chris@0: Chris@0: // parse Chris@0: $stmts = $parser->parse($code); Chris@0: Chris@0: // traverse Chris@0: $stmts = $traverser->traverse($stmts); Chris@0: Chris@0: // pretty print Chris@0: $code = $prettyPrinter->prettyPrintFile($stmts); Chris@0: Chris@0: // write the converted file to the target directory Chris@0: file_put_contents( Chris@0: substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)), Chris@0: $code Chris@0: ); Chris@0: } catch (PhpParser\Error $e) { Chris@0: echo 'Parse Error: ', $e->getMessage(); Chris@0: } Chris@0: } Chris@0: ``` Chris@0: Chris@0: Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do Chris@0: is convert `A\\B` style names to `A_B` style ones. Chris@0: Chris@0: ```php Chris@0: use PhpParser\Node; Chris@0: Chris@0: class NamespaceConverter extends \PhpParser\NodeVisitorAbstract Chris@0: { Chris@0: public function leaveNode(Node $node) { Chris@0: if ($node instanceof Node\Name) { Chris@13: return new Node\Name(str_replace('\\', '_', $node->toString())); Chris@0: } Chris@0: } Chris@0: } Chris@0: ``` Chris@0: Chris@0: The above code profits from the fact that the `NameResolver` already resolved all names as far as Chris@0: possible, so we don't need to do that. We only need to create a string with the name parts separated Chris@13: by underscores instead of backslashes. This is what `str_replace('\\', '_', $node->toString())` does. (If you want to Chris@0: create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create Chris@0: a new name from the string and return it. Returning a new node replaces the old node. Chris@0: Chris@0: Another thing we need to do is change the class/function/const declarations. Currently they contain Chris@0: only the shortname (i.e. the last part of the name), but they need to contain the complete name including Chris@0: the namespace prefix: Chris@0: Chris@0: ```php Chris@0: use PhpParser\Node; Chris@0: use PhpParser\Node\Stmt; Chris@0: Chris@0: class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract Chris@0: { Chris@0: public function leaveNode(Node $node) { Chris@0: if ($node instanceof Node\Name) { Chris@13: return new Node\Name(str_replace('\\', '_', $node->toString())); Chris@0: } elseif ($node instanceof Stmt\Class_ Chris@0: || $node instanceof Stmt\Interface_ Chris@0: || $node instanceof Stmt\Function_) { Chris@13: $node->name = str_replace('\\', '_', $node->namespacedName->toString()); Chris@0: } elseif ($node instanceof Stmt\Const_) { Chris@0: foreach ($node->consts as $const) { Chris@13: $const->name = str_replace('\\', '_', $const->namespacedName->toString()); Chris@0: } Chris@0: } Chris@0: } Chris@0: } Chris@0: ``` Chris@0: Chris@0: There is not much more to it than converting the namespaced name to string with `_` as separator. Chris@0: Chris@0: The last thing we need to do is remove the `namespace` and `use` statements: Chris@0: Chris@0: ```php Chris@0: use PhpParser\Node; Chris@0: use PhpParser\Node\Stmt; Chris@13: use PhpParser\NodeTraverser; Chris@0: Chris@0: class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract Chris@0: { Chris@0: public function leaveNode(Node $node) { Chris@0: if ($node instanceof Node\Name) { Chris@13: return new Node\Name(str_replace('\\', '_', $node->toString())); Chris@0: } elseif ($node instanceof Stmt\Class_ Chris@0: || $node instanceof Stmt\Interface_ Chris@0: || $node instanceof Stmt\Function_) { Chris@13: $node->name = str_replace('\\', '_', $node->namespacedName->toString(); Chris@0: } elseif ($node instanceof Stmt\Const_) { Chris@0: foreach ($node->consts as $const) { Chris@13: $const->name = str_replace('\\', '_', $const->namespacedName->toString()); Chris@0: } Chris@0: } elseif ($node instanceof Stmt\Namespace_) { Chris@0: // returning an array merges is into the parent array Chris@0: return $node->stmts; Chris@0: } elseif ($node instanceof Stmt\Use_) { Chris@13: // remove use nodes altogether Chris@13: return NodeTraverser::REMOVE_NODE; Chris@0: } Chris@0: } Chris@0: } Chris@0: ``` Chris@0: Chris@0: That's all.