Chris@13: Walking the AST Chris@13: =============== Chris@13: Chris@13: The most common way to work with the AST is by using a node traverser and one or more node visitors. Chris@13: As a basic example, the following code changes all literal integers in the AST into strings (e.g., Chris@13: `42` becomes `'42'`.) Chris@13: Chris@13: ```php Chris@13: use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract}; Chris@13: Chris@13: $traverser = new NodeTraverser; Chris@13: $traverser->addVisitor(new class extends NodeVisitorAbstract { Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Scalar\LNumber) { Chris@13: return new Node\Scalar\String_((string) $node->value); Chris@13: } Chris@13: } Chris@13: }); Chris@13: Chris@13: $stmts = ...; Chris@13: $modifiedStmts = $traverser->traverse($stmts); Chris@13: ``` Chris@13: Chris@13: Node visitors Chris@13: ------------- Chris@13: Chris@13: Each node visitor implements an interface with following four methods: Chris@13: Chris@13: ```php Chris@13: interface NodeVisitor { Chris@13: public function beforeTraverse(array $nodes); Chris@13: public function enterNode(Node $node); Chris@13: public function leaveNode(Node $node); Chris@13: public function afterTraverse(array $nodes); Chris@13: } Chris@13: ``` Chris@13: Chris@13: The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal Chris@13: respectively, and are passed the entire AST. They can be used to perform any necessary state Chris@13: setup or cleanup. Chris@13: Chris@13: The `enterNode()` method is called when a node is first encountered, before its children are Chris@13: processed ("preorder"). The `leaveNode()` method is called after all children have been visited Chris@13: ("postorder"). Chris@13: Chris@13: For example, if we have the following excerpt of an AST Chris@13: Chris@13: ``` Chris@13: Expr_FuncCall( Chris@13: name: Name( Chris@13: parts: array( Chris@13: 0: printLine Chris@13: ) Chris@13: ) Chris@13: args: array( Chris@13: 0: Arg( Chris@13: value: Scalar_String( Chris@13: value: Hello World!!! Chris@13: ) Chris@13: byRef: false Chris@13: unpack: false Chris@13: ) Chris@13: ) Chris@13: ) Chris@13: ``` Chris@13: Chris@13: then the enter/leave methods will be called in the following order: Chris@13: Chris@13: ``` Chris@13: enterNode(Expr_FuncCall) Chris@13: enterNode(Name) Chris@13: leaveNode(Name) Chris@13: enterNode(Arg) Chris@13: enterNode(Scalar_String) Chris@13: leaveNode(Scalar_String) Chris@13: leaveNode(Arg) Chris@13: leaveNode(Expr_FuncCall) Chris@13: ``` Chris@13: Chris@13: A common pattern is that `enterNode` is used to collect some information and then `leaveNode` Chris@13: performs modifications based on that. At the time when `leaveNode` is called, all the code inside Chris@13: the node will have already been visited and necessary information collected. Chris@13: Chris@13: As you usually do not want to implement all four methods, it is recommended that you extend Chris@13: `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides Chris@13: empty default implementations. Chris@13: Chris@13: Modifying the AST Chris@13: ----------------- Chris@13: Chris@13: There are a number of ways in which the AST can be modified from inside a node visitor. The first Chris@13: and simplest is to simply change AST properties inside the visitor: Chris@13: Chris@13: ```php Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Scalar\LNumber) { Chris@13: // increment all integer literals Chris@13: $node->value++; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: The second is to replace a node entirely by returning a new node: Chris@13: Chris@13: ```php Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) { Chris@13: // Convert all $a && $b expressions into !($a && $b) Chris@13: return new Node\Expr\BooleanNot($node); Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about Chris@13: where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal Chris@13: will also consider the children of the new node. If you aren't careful, this can lead to infinite Chris@13: recursion. For example, let's take the previous code sample and use enterNode instead: Chris@13: Chris@13: ```php Chris@13: public function enterNode(Node $node) { Chris@13: if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) { Chris@13: // Convert all $a && $b expressions into !($a && $b) Chris@13: return new Node\Expr\BooleanNot($node); Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and Chris@13: only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up Chris@13: with `!!($a && $b)`. This will continue until PHP hits the memory limit. Chris@13: Chris@13: Finally, two special replacement types are supported only by leaveNode. The first is removal of a Chris@13: node: Chris@13: Chris@13: ```php Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Stmt\Return_) { Chris@13: // Remove all return statements Chris@13: return NodeTraverser::REMOVE_NODE; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: Node removal only works if the parent structure is an array. This means that usually it only makes Chris@13: sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few Chris@13: more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists). Chris@13: Chris@13: On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no Chris@13: meaningful way in which the `$a` part could be removed. If you want to remove an expression, you Chris@13: generally want to remove it together with a surrounding expression statement: Chris@13: Chris@13: ```php Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Stmt\Expression Chris@13: && $node->expr instanceof Node\Expr\FuncCall Chris@13: && $node->expr->name instanceof Node\Name Chris@13: && $node->expr->name->toString() === 'var_dump' Chris@13: ) { Chris@13: return NodeTraverser::REMOVE_NODE; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: This example will remove all calls to `var_dump()` which occur as expression statements. This means Chris@13: that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no Chris@13: obvious way in which it can be removed). Chris@13: Chris@13: Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this Chris@13: only works inside leaveNode and only if the parent structure is an array. Chris@13: Chris@13: ```php Chris@13: public function leaveNode(Node $node) { Chris@13: if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) { Chris@13: // Convert "return foo();" into "$retval = foo(); return $retval;" Chris@13: $var = new Node\Expr\Variable('retval'); Chris@13: return [ Chris@13: new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)), Chris@13: new Node\Stmt\Return_($var), Chris@13: ]; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: Short-circuiting traversal Chris@13: -------------------------- Chris@13: Chris@13: An AST can easily contain thousands of nodes, and traversing over all of them may be slow, Chris@13: especially if you have more than one visitor. In some cases, it is possible to avoid a full Chris@13: traversal. Chris@13: Chris@13: If you are looking for all class declarations in a file (and assuming you're not interested in Chris@13: anonymous classes), you know that once you've seen a class declaration, there is no point in also Chris@13: checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can Chris@13: instruct the traverser to not recurse into the class node: Chris@13: Chris@13: ``` Chris@13: private $classes = []; Chris@13: public function enterNode(Node $node) { Chris@13: if ($node instanceof Node\Stmt\Class_) { Chris@13: $this->classes[] = $node; Chris@13: return NodeTraverser::DONT_TRAVERSE_CHILDREN; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: Of course, this option is only available in enterNode, because it's already too late by the time Chris@13: leaveNode is reached. Chris@13: Chris@13: If you are only looking for one specific node, it is also possible to abort the traversal entirely Chris@13: after finding it. For example, if you are looking for the node of a class with a certain name (and Chris@13: discounting exotic cases like conditionally defining a class two times), you can stop traversal Chris@13: once you found it: Chris@13: Chris@13: ``` Chris@13: private $class = null; Chris@13: public function enterNode(Node $node) { Chris@13: if ($node instanceof Node\Stmt\Class_ && Chris@17: $node->namespacedName->toString() === 'Foo\Bar\Baz' Chris@13: ) { Chris@13: $this->class = $node; Chris@13: return NodeTraverser::STOP_TRAVERSAL; Chris@13: } Chris@13: } Chris@13: ``` Chris@13: Chris@13: This works both in enterNode and leaveNode. Note that this particular case can also be more easily Chris@13: handled using a NodeFinder, which will be introduced below. Chris@13: Chris@13: Multiple visitors Chris@13: ----------------- Chris@13: Chris@13: A single traverser can be used with multiple visitors: Chris@13: Chris@13: ```php Chris@13: $traverser = new NodeTraverser; Chris@13: $traverser->addVisitor($visitorA); Chris@13: $traverser->addVisitor($visitorB); Chris@17: $stmts = $traverser->traverse($stmts); Chris@13: ``` Chris@13: Chris@13: It is important to understand that if a traverser is run with multiple visitors, the visitors will Chris@13: be interleaved. Given the following AST excerpt Chris@13: Chris@13: ``` Chris@13: Stmt_Return( Chris@13: expr: Expr_Variable( Chris@13: name: foobar Chris@13: ) Chris@13: ) Chris@13: ``` Chris@13: Chris@13: the following method calls will be performed: Chris@13: Chris@13: ``` Chris@13: $visitorA->enterNode(Stmt_Return) Chris@13: $visitorB->enterNode(Stmt_Return) Chris@13: $visitorA->enterNode(Expr_Variable) Chris@13: $visitorB->enterNode(Expr_Variable) Chris@13: $visitorA->leaveNode(Expr_Variable) Chris@13: $visitorB->leaveNode(Expr_Variable) Chris@13: $visitorA->leaveNode(Stmt_Return) Chris@13: $visitorB->leaveNode(Stmt_Return) Chris@13: ``` Chris@13: Chris@13: That is, when visiting a node, enterNode and leaveNode will always be called for all visitors. Chris@13: Running multiple visitors in parallel improves performance, as the AST only has to be traversed Chris@13: once. However, it is not always possible to write visitors in a way that allows interleaved Chris@13: execution. In this case, you can always fall back to performing multiple traversals: Chris@13: Chris@13: ```php Chris@13: $traverserA = new NodeTraverser; Chris@13: $traverserA->addVisitor($visitorA); Chris@13: $traverserB = new NodeTraverser; Chris@13: $traverserB->addVisitor($visitorB); Chris@13: $stmts = $traverserA->traverser($stmts); Chris@13: $stmts = $traverserB->traverser($stmts); Chris@13: ``` Chris@13: Chris@13: When using multiple visitors, it is important to understand how they interact with the various Chris@13: special enterNode/leaveNode return values: Chris@13: Chris@13: * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all* Chris@13: visitors. Chris@17: * If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all* Chris@17: visitors, and all *subsequent* visitors will not visit the current node. Chris@13: * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors. Chris@13: * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node, Chris@13: not the original one. Chris@13: * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node. Chris@13: * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node Chris@13: that was replaced, nor the replacement nodes. Chris@13: Chris@13: Simple node finding Chris@13: ------------------- Chris@13: Chris@13: While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome Chris@13: for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either Chris@13: satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are Chris@13: shown in the following: Chris@13: Chris@13: ```php Chris@13: use PhpParser\{Node, NodeFinder}; Chris@13: Chris@13: $nodeFinder = new NodeFinder; Chris@13: Chris@13: // Find all class nodes. Chris@13: $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class); Chris@13: Chris@13: // Find all classes that extend another class Chris@17: $extendingClasses = $nodeFinder->find($stmts, function(Node $node) { Chris@13: return $node instanceof Node\Stmt\Class_ Chris@13: && $node->extends !== null; Chris@13: }); Chris@13: Chris@13: // Find first class occuring in the AST. Returns null if no class exists. Chris@13: $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class); Chris@13: Chris@13: // Find first class that has name $name Chris@13: $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) { Chris@13: return $node instanceof Node\Stmt\Class_ Chris@13: && $node->resolvedName->toString() === $name; Chris@13: }); Chris@13: ``` Chris@13: Chris@13: Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a Chris@13: common use case. Chris@13: Chris@13: Parent and sibling references Chris@13: ----------------------------- Chris@13: Chris@13: The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should Chris@13: be accessed: From parents to children. However, it can often be convenient to operate in the Chris@13: reverse direction: When working on a node, you might want to check if the parent node satisfies a Chris@13: certain property. Chris@13: Chris@13: PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily Chris@17: emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.