annotate vendor/nikic/php-parser/doc/component/Walking_the_AST.markdown @ 19:fa3358dc1485 tip

Add ndrum files
author Chris Cannam
date Wed, 28 Aug 2019 13:14:47 +0100
parents 129ea1e6d783
children
rev   line source
Chris@13 1 Walking the AST
Chris@13 2 ===============
Chris@13 3
Chris@13 4 The most common way to work with the AST is by using a node traverser and one or more node visitors.
Chris@13 5 As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
Chris@13 6 `42` becomes `'42'`.)
Chris@13 7
Chris@13 8 ```php
Chris@13 9 use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
Chris@13 10
Chris@13 11 $traverser = new NodeTraverser;
Chris@13 12 $traverser->addVisitor(new class extends NodeVisitorAbstract {
Chris@13 13 public function leaveNode(Node $node) {
Chris@13 14 if ($node instanceof Node\Scalar\LNumber) {
Chris@13 15 return new Node\Scalar\String_((string) $node->value);
Chris@13 16 }
Chris@13 17 }
Chris@13 18 });
Chris@13 19
Chris@13 20 $stmts = ...;
Chris@13 21 $modifiedStmts = $traverser->traverse($stmts);
Chris@13 22 ```
Chris@13 23
Chris@13 24 Node visitors
Chris@13 25 -------------
Chris@13 26
Chris@13 27 Each node visitor implements an interface with following four methods:
Chris@13 28
Chris@13 29 ```php
Chris@13 30 interface NodeVisitor {
Chris@13 31 public function beforeTraverse(array $nodes);
Chris@13 32 public function enterNode(Node $node);
Chris@13 33 public function leaveNode(Node $node);
Chris@13 34 public function afterTraverse(array $nodes);
Chris@13 35 }
Chris@13 36 ```
Chris@13 37
Chris@13 38 The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
Chris@13 39 respectively, and are passed the entire AST. They can be used to perform any necessary state
Chris@13 40 setup or cleanup.
Chris@13 41
Chris@13 42 The `enterNode()` method is called when a node is first encountered, before its children are
Chris@13 43 processed ("preorder"). The `leaveNode()` method is called after all children have been visited
Chris@13 44 ("postorder").
Chris@13 45
Chris@13 46 For example, if we have the following excerpt of an AST
Chris@13 47
Chris@13 48 ```
Chris@13 49 Expr_FuncCall(
Chris@13 50 name: Name(
Chris@13 51 parts: array(
Chris@13 52 0: printLine
Chris@13 53 )
Chris@13 54 )
Chris@13 55 args: array(
Chris@13 56 0: Arg(
Chris@13 57 value: Scalar_String(
Chris@13 58 value: Hello World!!!
Chris@13 59 )
Chris@13 60 byRef: false
Chris@13 61 unpack: false
Chris@13 62 )
Chris@13 63 )
Chris@13 64 )
Chris@13 65 ```
Chris@13 66
Chris@13 67 then the enter/leave methods will be called in the following order:
Chris@13 68
Chris@13 69 ```
Chris@13 70 enterNode(Expr_FuncCall)
Chris@13 71 enterNode(Name)
Chris@13 72 leaveNode(Name)
Chris@13 73 enterNode(Arg)
Chris@13 74 enterNode(Scalar_String)
Chris@13 75 leaveNode(Scalar_String)
Chris@13 76 leaveNode(Arg)
Chris@13 77 leaveNode(Expr_FuncCall)
Chris@13 78 ```
Chris@13 79
Chris@13 80 A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
Chris@13 81 performs modifications based on that. At the time when `leaveNode` is called, all the code inside
Chris@13 82 the node will have already been visited and necessary information collected.
Chris@13 83
Chris@13 84 As you usually do not want to implement all four methods, it is recommended that you extend
Chris@13 85 `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
Chris@13 86 empty default implementations.
Chris@13 87
Chris@13 88 Modifying the AST
Chris@13 89 -----------------
Chris@13 90
Chris@13 91 There are a number of ways in which the AST can be modified from inside a node visitor. The first
Chris@13 92 and simplest is to simply change AST properties inside the visitor:
Chris@13 93
Chris@13 94 ```php
Chris@13 95 public function leaveNode(Node $node) {
Chris@13 96 if ($node instanceof Node\Scalar\LNumber) {
Chris@13 97 // increment all integer literals
Chris@13 98 $node->value++;
Chris@13 99 }
Chris@13 100 }
Chris@13 101 ```
Chris@13 102
Chris@13 103 The second is to replace a node entirely by returning a new node:
Chris@13 104
Chris@13 105 ```php
Chris@13 106 public function leaveNode(Node $node) {
Chris@13 107 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
Chris@13 108 // Convert all $a && $b expressions into !($a && $b)
Chris@13 109 return new Node\Expr\BooleanNot($node);
Chris@13 110 }
Chris@13 111 }
Chris@13 112 ```
Chris@13 113
Chris@13 114 Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
Chris@13 115 where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
Chris@13 116 will also consider the children of the new node. If you aren't careful, this can lead to infinite
Chris@13 117 recursion. For example, let's take the previous code sample and use enterNode instead:
Chris@13 118
Chris@13 119 ```php
Chris@13 120 public function enterNode(Node $node) {
Chris@13 121 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
Chris@13 122 // Convert all $a && $b expressions into !($a && $b)
Chris@13 123 return new Node\Expr\BooleanNot($node);
Chris@13 124 }
Chris@13 125 }
Chris@13 126 ```
Chris@13 127
Chris@13 128 Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
Chris@13 129 only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
Chris@13 130 with `!!($a && $b)`. This will continue until PHP hits the memory limit.
Chris@13 131
Chris@13 132 Finally, two special replacement types are supported only by leaveNode. The first is removal of a
Chris@13 133 node:
Chris@13 134
Chris@13 135 ```php
Chris@13 136 public function leaveNode(Node $node) {
Chris@13 137 if ($node instanceof Node\Stmt\Return_) {
Chris@13 138 // Remove all return statements
Chris@13 139 return NodeTraverser::REMOVE_NODE;
Chris@13 140 }
Chris@13 141 }
Chris@13 142 ```
Chris@13 143
Chris@13 144 Node removal only works if the parent structure is an array. This means that usually it only makes
Chris@13 145 sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
Chris@13 146 more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
Chris@13 147
Chris@13 148 On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
Chris@13 149 meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
Chris@13 150 generally want to remove it together with a surrounding expression statement:
Chris@13 151
Chris@13 152 ```php
Chris@13 153 public function leaveNode(Node $node) {
Chris@13 154 if ($node instanceof Node\Stmt\Expression
Chris@13 155 && $node->expr instanceof Node\Expr\FuncCall
Chris@13 156 && $node->expr->name instanceof Node\Name
Chris@13 157 && $node->expr->name->toString() === 'var_dump'
Chris@13 158 ) {
Chris@13 159 return NodeTraverser::REMOVE_NODE;
Chris@13 160 }
Chris@13 161 }
Chris@13 162 ```
Chris@13 163
Chris@13 164 This example will remove all calls to `var_dump()` which occur as expression statements. This means
Chris@13 165 that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
Chris@13 166 obvious way in which it can be removed).
Chris@13 167
Chris@13 168 Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
Chris@13 169 only works inside leaveNode and only if the parent structure is an array.
Chris@13 170
Chris@13 171 ```php
Chris@13 172 public function leaveNode(Node $node) {
Chris@13 173 if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
Chris@13 174 // Convert "return foo();" into "$retval = foo(); return $retval;"
Chris@13 175 $var = new Node\Expr\Variable('retval');
Chris@13 176 return [
Chris@13 177 new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
Chris@13 178 new Node\Stmt\Return_($var),
Chris@13 179 ];
Chris@13 180 }
Chris@13 181 }
Chris@13 182 ```
Chris@13 183
Chris@13 184 Short-circuiting traversal
Chris@13 185 --------------------------
Chris@13 186
Chris@13 187 An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
Chris@13 188 especially if you have more than one visitor. In some cases, it is possible to avoid a full
Chris@13 189 traversal.
Chris@13 190
Chris@13 191 If you are looking for all class declarations in a file (and assuming you're not interested in
Chris@13 192 anonymous classes), you know that once you've seen a class declaration, there is no point in also
Chris@13 193 checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
Chris@13 194 instruct the traverser to not recurse into the class node:
Chris@13 195
Chris@13 196 ```
Chris@13 197 private $classes = [];
Chris@13 198 public function enterNode(Node $node) {
Chris@13 199 if ($node instanceof Node\Stmt\Class_) {
Chris@13 200 $this->classes[] = $node;
Chris@13 201 return NodeTraverser::DONT_TRAVERSE_CHILDREN;
Chris@13 202 }
Chris@13 203 }
Chris@13 204 ```
Chris@13 205
Chris@13 206 Of course, this option is only available in enterNode, because it's already too late by the time
Chris@13 207 leaveNode is reached.
Chris@13 208
Chris@13 209 If you are only looking for one specific node, it is also possible to abort the traversal entirely
Chris@13 210 after finding it. For example, if you are looking for the node of a class with a certain name (and
Chris@13 211 discounting exotic cases like conditionally defining a class two times), you can stop traversal
Chris@13 212 once you found it:
Chris@13 213
Chris@13 214 ```
Chris@13 215 private $class = null;
Chris@13 216 public function enterNode(Node $node) {
Chris@13 217 if ($node instanceof Node\Stmt\Class_ &&
Chris@17 218 $node->namespacedName->toString() === 'Foo\Bar\Baz'
Chris@13 219 ) {
Chris@13 220 $this->class = $node;
Chris@13 221 return NodeTraverser::STOP_TRAVERSAL;
Chris@13 222 }
Chris@13 223 }
Chris@13 224 ```
Chris@13 225
Chris@13 226 This works both in enterNode and leaveNode. Note that this particular case can also be more easily
Chris@13 227 handled using a NodeFinder, which will be introduced below.
Chris@13 228
Chris@13 229 Multiple visitors
Chris@13 230 -----------------
Chris@13 231
Chris@13 232 A single traverser can be used with multiple visitors:
Chris@13 233
Chris@13 234 ```php
Chris@13 235 $traverser = new NodeTraverser;
Chris@13 236 $traverser->addVisitor($visitorA);
Chris@13 237 $traverser->addVisitor($visitorB);
Chris@17 238 $stmts = $traverser->traverse($stmts);
Chris@13 239 ```
Chris@13 240
Chris@13 241 It is important to understand that if a traverser is run with multiple visitors, the visitors will
Chris@13 242 be interleaved. Given the following AST excerpt
Chris@13 243
Chris@13 244 ```
Chris@13 245 Stmt_Return(
Chris@13 246 expr: Expr_Variable(
Chris@13 247 name: foobar
Chris@13 248 )
Chris@13 249 )
Chris@13 250 ```
Chris@13 251
Chris@13 252 the following method calls will be performed:
Chris@13 253
Chris@13 254 ```
Chris@13 255 $visitorA->enterNode(Stmt_Return)
Chris@13 256 $visitorB->enterNode(Stmt_Return)
Chris@13 257 $visitorA->enterNode(Expr_Variable)
Chris@13 258 $visitorB->enterNode(Expr_Variable)
Chris@13 259 $visitorA->leaveNode(Expr_Variable)
Chris@13 260 $visitorB->leaveNode(Expr_Variable)
Chris@13 261 $visitorA->leaveNode(Stmt_Return)
Chris@13 262 $visitorB->leaveNode(Stmt_Return)
Chris@13 263 ```
Chris@13 264
Chris@13 265 That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
Chris@13 266 Running multiple visitors in parallel improves performance, as the AST only has to be traversed
Chris@13 267 once. However, it is not always possible to write visitors in a way that allows interleaved
Chris@13 268 execution. In this case, you can always fall back to performing multiple traversals:
Chris@13 269
Chris@13 270 ```php
Chris@13 271 $traverserA = new NodeTraverser;
Chris@13 272 $traverserA->addVisitor($visitorA);
Chris@13 273 $traverserB = new NodeTraverser;
Chris@13 274 $traverserB->addVisitor($visitorB);
Chris@13 275 $stmts = $traverserA->traverser($stmts);
Chris@13 276 $stmts = $traverserB->traverser($stmts);
Chris@13 277 ```
Chris@13 278
Chris@13 279 When using multiple visitors, it is important to understand how they interact with the various
Chris@13 280 special enterNode/leaveNode return values:
Chris@13 281
Chris@13 282 * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
Chris@13 283 visitors.
Chris@17 284 * If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all*
Chris@17 285 visitors, and all *subsequent* visitors will not visit the current node.
Chris@13 286 * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
Chris@13 287 * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
Chris@13 288 not the original one.
Chris@13 289 * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
Chris@13 290 * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
Chris@13 291 that was replaced, nor the replacement nodes.
Chris@13 292
Chris@13 293 Simple node finding
Chris@13 294 -------------------
Chris@13 295
Chris@13 296 While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
Chris@13 297 for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
Chris@13 298 satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
Chris@13 299 shown in the following:
Chris@13 300
Chris@13 301 ```php
Chris@13 302 use PhpParser\{Node, NodeFinder};
Chris@13 303
Chris@13 304 $nodeFinder = new NodeFinder;
Chris@13 305
Chris@13 306 // Find all class nodes.
Chris@13 307 $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
Chris@13 308
Chris@13 309 // Find all classes that extend another class
Chris@17 310 $extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
Chris@13 311 return $node instanceof Node\Stmt\Class_
Chris@13 312 && $node->extends !== null;
Chris@13 313 });
Chris@13 314
Chris@13 315 // Find first class occuring in the AST. Returns null if no class exists.
Chris@13 316 $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
Chris@13 317
Chris@13 318 // Find first class that has name $name
Chris@13 319 $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
Chris@13 320 return $node instanceof Node\Stmt\Class_
Chris@13 321 && $node->resolvedName->toString() === $name;
Chris@13 322 });
Chris@13 323 ```
Chris@13 324
Chris@13 325 Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
Chris@13 326 common use case.
Chris@13 327
Chris@13 328 Parent and sibling references
Chris@13 329 -----------------------------
Chris@13 330
Chris@13 331 The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
Chris@13 332 be accessed: From parents to children. However, it can often be convenient to operate in the
Chris@13 333 reverse direction: When working on a node, you might want to check if the parent node satisfies a
Chris@13 334 certain property.
Chris@13 335
Chris@13 336 PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
Chris@17 337 emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.