annotate vendor/nikic/php-parser/doc/component/Walking_the_AST.markdown @ 5:12f9dff5fda9 tip

Update to Drupal core 8.7.1
author Chris Cannam
date Thu, 09 May 2019 15:34:47 +0100
parents a9cd425dd02b
children
rev   line source
Chris@0 1 Walking the AST
Chris@0 2 ===============
Chris@0 3
Chris@0 4 The most common way to work with the AST is by using a node traverser and one or more node visitors.
Chris@0 5 As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
Chris@0 6 `42` becomes `'42'`.)
Chris@0 7
Chris@0 8 ```php
Chris@0 9 use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
Chris@0 10
Chris@0 11 $traverser = new NodeTraverser;
Chris@0 12 $traverser->addVisitor(new class extends NodeVisitorAbstract {
Chris@0 13 public function leaveNode(Node $node) {
Chris@0 14 if ($node instanceof Node\Scalar\LNumber) {
Chris@0 15 return new Node\Scalar\String_((string) $node->value);
Chris@0 16 }
Chris@0 17 }
Chris@0 18 });
Chris@0 19
Chris@0 20 $stmts = ...;
Chris@0 21 $modifiedStmts = $traverser->traverse($stmts);
Chris@0 22 ```
Chris@0 23
Chris@0 24 Node visitors
Chris@0 25 -------------
Chris@0 26
Chris@0 27 Each node visitor implements an interface with following four methods:
Chris@0 28
Chris@0 29 ```php
Chris@0 30 interface NodeVisitor {
Chris@0 31 public function beforeTraverse(array $nodes);
Chris@0 32 public function enterNode(Node $node);
Chris@0 33 public function leaveNode(Node $node);
Chris@0 34 public function afterTraverse(array $nodes);
Chris@0 35 }
Chris@0 36 ```
Chris@0 37
Chris@0 38 The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
Chris@0 39 respectively, and are passed the entire AST. They can be used to perform any necessary state
Chris@0 40 setup or cleanup.
Chris@0 41
Chris@0 42 The `enterNode()` method is called when a node is first encountered, before its children are
Chris@0 43 processed ("preorder"). The `leaveNode()` method is called after all children have been visited
Chris@0 44 ("postorder").
Chris@0 45
Chris@0 46 For example, if we have the following excerpt of an AST
Chris@0 47
Chris@0 48 ```
Chris@0 49 Expr_FuncCall(
Chris@0 50 name: Name(
Chris@0 51 parts: array(
Chris@0 52 0: printLine
Chris@0 53 )
Chris@0 54 )
Chris@0 55 args: array(
Chris@0 56 0: Arg(
Chris@0 57 value: Scalar_String(
Chris@0 58 value: Hello World!!!
Chris@0 59 )
Chris@0 60 byRef: false
Chris@0 61 unpack: false
Chris@0 62 )
Chris@0 63 )
Chris@0 64 )
Chris@0 65 ```
Chris@0 66
Chris@0 67 then the enter/leave methods will be called in the following order:
Chris@0 68
Chris@0 69 ```
Chris@0 70 enterNode(Expr_FuncCall)
Chris@0 71 enterNode(Name)
Chris@0 72 leaveNode(Name)
Chris@0 73 enterNode(Arg)
Chris@0 74 enterNode(Scalar_String)
Chris@0 75 leaveNode(Scalar_String)
Chris@0 76 leaveNode(Arg)
Chris@0 77 leaveNode(Expr_FuncCall)
Chris@0 78 ```
Chris@0 79
Chris@0 80 A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
Chris@0 81 performs modifications based on that. At the time when `leaveNode` is called, all the code inside
Chris@0 82 the node will have already been visited and necessary information collected.
Chris@0 83
Chris@0 84 As you usually do not want to implement all four methods, it is recommended that you extend
Chris@0 85 `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
Chris@0 86 empty default implementations.
Chris@0 87
Chris@0 88 Modifying the AST
Chris@0 89 -----------------
Chris@0 90
Chris@0 91 There are a number of ways in which the AST can be modified from inside a node visitor. The first
Chris@0 92 and simplest is to simply change AST properties inside the visitor:
Chris@0 93
Chris@0 94 ```php
Chris@0 95 public function leaveNode(Node $node) {
Chris@0 96 if ($node instanceof Node\Scalar\LNumber) {
Chris@0 97 // increment all integer literals
Chris@0 98 $node->value++;
Chris@0 99 }
Chris@0 100 }
Chris@0 101 ```
Chris@0 102
Chris@0 103 The second is to replace a node entirely by returning a new node:
Chris@0 104
Chris@0 105 ```php
Chris@0 106 public function leaveNode(Node $node) {
Chris@0 107 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
Chris@0 108 // Convert all $a && $b expressions into !($a && $b)
Chris@0 109 return new Node\Expr\BooleanNot($node);
Chris@0 110 }
Chris@0 111 }
Chris@0 112 ```
Chris@0 113
Chris@0 114 Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
Chris@0 115 where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
Chris@0 116 will also consider the children of the new node. If you aren't careful, this can lead to infinite
Chris@0 117 recursion. For example, let's take the previous code sample and use enterNode instead:
Chris@0 118
Chris@0 119 ```php
Chris@0 120 public function enterNode(Node $node) {
Chris@0 121 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
Chris@0 122 // Convert all $a && $b expressions into !($a && $b)
Chris@0 123 return new Node\Expr\BooleanNot($node);
Chris@0 124 }
Chris@0 125 }
Chris@0 126 ```
Chris@0 127
Chris@0 128 Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
Chris@0 129 only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
Chris@0 130 with `!!($a && $b)`. This will continue until PHP hits the memory limit.
Chris@0 131
Chris@0 132 Finally, two special replacement types are supported only by leaveNode. The first is removal of a
Chris@0 133 node:
Chris@0 134
Chris@0 135 ```php
Chris@0 136 public function leaveNode(Node $node) {
Chris@0 137 if ($node instanceof Node\Stmt\Return_) {
Chris@0 138 // Remove all return statements
Chris@0 139 return NodeTraverser::REMOVE_NODE;
Chris@0 140 }
Chris@0 141 }
Chris@0 142 ```
Chris@0 143
Chris@0 144 Node removal only works if the parent structure is an array. This means that usually it only makes
Chris@0 145 sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
Chris@0 146 more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
Chris@0 147
Chris@0 148 On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
Chris@0 149 meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
Chris@0 150 generally want to remove it together with a surrounding expression statement:
Chris@0 151
Chris@0 152 ```php
Chris@0 153 public function leaveNode(Node $node) {
Chris@0 154 if ($node instanceof Node\Stmt\Expression
Chris@0 155 && $node->expr instanceof Node\Expr\FuncCall
Chris@0 156 && $node->expr->name instanceof Node\Name
Chris@0 157 && $node->expr->name->toString() === 'var_dump'
Chris@0 158 ) {
Chris@0 159 return NodeTraverser::REMOVE_NODE;
Chris@0 160 }
Chris@0 161 }
Chris@0 162 ```
Chris@0 163
Chris@0 164 This example will remove all calls to `var_dump()` which occur as expression statements. This means
Chris@0 165 that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
Chris@0 166 obvious way in which it can be removed).
Chris@0 167
Chris@0 168 Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
Chris@0 169 only works inside leaveNode and only if the parent structure is an array.
Chris@0 170
Chris@0 171 ```php
Chris@0 172 public function leaveNode(Node $node) {
Chris@0 173 if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
Chris@0 174 // Convert "return foo();" into "$retval = foo(); return $retval;"
Chris@0 175 $var = new Node\Expr\Variable('retval');
Chris@0 176 return [
Chris@0 177 new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
Chris@0 178 new Node\Stmt\Return_($var),
Chris@0 179 ];
Chris@0 180 }
Chris@0 181 }
Chris@0 182 ```
Chris@0 183
Chris@0 184 Short-circuiting traversal
Chris@0 185 --------------------------
Chris@0 186
Chris@0 187 An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
Chris@0 188 especially if you have more than one visitor. In some cases, it is possible to avoid a full
Chris@0 189 traversal.
Chris@0 190
Chris@0 191 If you are looking for all class declarations in a file (and assuming you're not interested in
Chris@0 192 anonymous classes), you know that once you've seen a class declaration, there is no point in also
Chris@0 193 checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
Chris@0 194 instruct the traverser to not recurse into the class node:
Chris@0 195
Chris@0 196 ```
Chris@0 197 private $classes = [];
Chris@0 198 public function enterNode(Node $node) {
Chris@0 199 if ($node instanceof Node\Stmt\Class_) {
Chris@0 200 $this->classes[] = $node;
Chris@0 201 return NodeTraverser::DONT_TRAVERSE_CHILDREN;
Chris@0 202 }
Chris@0 203 }
Chris@0 204 ```
Chris@0 205
Chris@0 206 Of course, this option is only available in enterNode, because it's already too late by the time
Chris@0 207 leaveNode is reached.
Chris@0 208
Chris@0 209 If you are only looking for one specific node, it is also possible to abort the traversal entirely
Chris@0 210 after finding it. For example, if you are looking for the node of a class with a certain name (and
Chris@0 211 discounting exotic cases like conditionally defining a class two times), you can stop traversal
Chris@0 212 once you found it:
Chris@0 213
Chris@0 214 ```
Chris@0 215 private $class = null;
Chris@0 216 public function enterNode(Node $node) {
Chris@0 217 if ($node instanceof Node\Stmt\Class_ &&
Chris@4 218 $node->namespacedName->toString() === 'Foo\Bar\Baz'
Chris@0 219 ) {
Chris@0 220 $this->class = $node;
Chris@0 221 return NodeTraverser::STOP_TRAVERSAL;
Chris@0 222 }
Chris@0 223 }
Chris@0 224 ```
Chris@0 225
Chris@0 226 This works both in enterNode and leaveNode. Note that this particular case can also be more easily
Chris@0 227 handled using a NodeFinder, which will be introduced below.
Chris@0 228
Chris@0 229 Multiple visitors
Chris@0 230 -----------------
Chris@0 231
Chris@0 232 A single traverser can be used with multiple visitors:
Chris@0 233
Chris@0 234 ```php
Chris@0 235 $traverser = new NodeTraverser;
Chris@0 236 $traverser->addVisitor($visitorA);
Chris@0 237 $traverser->addVisitor($visitorB);
Chris@4 238 $stmts = $traverser->traverse($stmts);
Chris@0 239 ```
Chris@0 240
Chris@0 241 It is important to understand that if a traverser is run with multiple visitors, the visitors will
Chris@0 242 be interleaved. Given the following AST excerpt
Chris@0 243
Chris@0 244 ```
Chris@0 245 Stmt_Return(
Chris@0 246 expr: Expr_Variable(
Chris@0 247 name: foobar
Chris@0 248 )
Chris@0 249 )
Chris@0 250 ```
Chris@0 251
Chris@0 252 the following method calls will be performed:
Chris@0 253
Chris@0 254 ```
Chris@0 255 $visitorA->enterNode(Stmt_Return)
Chris@0 256 $visitorB->enterNode(Stmt_Return)
Chris@0 257 $visitorA->enterNode(Expr_Variable)
Chris@0 258 $visitorB->enterNode(Expr_Variable)
Chris@0 259 $visitorA->leaveNode(Expr_Variable)
Chris@0 260 $visitorB->leaveNode(Expr_Variable)
Chris@0 261 $visitorA->leaveNode(Stmt_Return)
Chris@0 262 $visitorB->leaveNode(Stmt_Return)
Chris@0 263 ```
Chris@0 264
Chris@0 265 That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
Chris@0 266 Running multiple visitors in parallel improves performance, as the AST only has to be traversed
Chris@0 267 once. However, it is not always possible to write visitors in a way that allows interleaved
Chris@0 268 execution. In this case, you can always fall back to performing multiple traversals:
Chris@0 269
Chris@0 270 ```php
Chris@0 271 $traverserA = new NodeTraverser;
Chris@0 272 $traverserA->addVisitor($visitorA);
Chris@0 273 $traverserB = new NodeTraverser;
Chris@0 274 $traverserB->addVisitor($visitorB);
Chris@0 275 $stmts = $traverserA->traverser($stmts);
Chris@0 276 $stmts = $traverserB->traverser($stmts);
Chris@0 277 ```
Chris@0 278
Chris@0 279 When using multiple visitors, it is important to understand how they interact with the various
Chris@0 280 special enterNode/leaveNode return values:
Chris@0 281
Chris@0 282 * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
Chris@0 283 visitors.
Chris@4 284 * If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all*
Chris@4 285 visitors, and all *subsequent* visitors will not visit the current node.
Chris@0 286 * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
Chris@0 287 * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
Chris@0 288 not the original one.
Chris@0 289 * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
Chris@0 290 * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
Chris@0 291 that was replaced, nor the replacement nodes.
Chris@0 292
Chris@0 293 Simple node finding
Chris@0 294 -------------------
Chris@0 295
Chris@0 296 While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
Chris@0 297 for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
Chris@0 298 satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
Chris@0 299 shown in the following:
Chris@0 300
Chris@0 301 ```php
Chris@0 302 use PhpParser\{Node, NodeFinder};
Chris@0 303
Chris@0 304 $nodeFinder = new NodeFinder;
Chris@0 305
Chris@0 306 // Find all class nodes.
Chris@0 307 $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
Chris@0 308
Chris@0 309 // Find all classes that extend another class
Chris@4 310 $extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
Chris@0 311 return $node instanceof Node\Stmt\Class_
Chris@0 312 && $node->extends !== null;
Chris@0 313 });
Chris@0 314
Chris@0 315 // Find first class occuring in the AST. Returns null if no class exists.
Chris@0 316 $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
Chris@0 317
Chris@0 318 // Find first class that has name $name
Chris@0 319 $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
Chris@0 320 return $node instanceof Node\Stmt\Class_
Chris@0 321 && $node->resolvedName->toString() === $name;
Chris@0 322 });
Chris@0 323 ```
Chris@0 324
Chris@0 325 Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
Chris@0 326 common use case.
Chris@0 327
Chris@0 328 Parent and sibling references
Chris@0 329 -----------------------------
Chris@0 330
Chris@0 331 The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
Chris@0 332 be accessed: From parents to children. However, it can often be convenient to operate in the
Chris@0 333 reverse direction: When working on a node, you might want to check if the parent node satisfies a
Chris@0 334 certain property.
Chris@0 335
Chris@0 336 PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
Chris@4 337 emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.