Chris@13
|
1 Walking the AST
|
Chris@13
|
2 ===============
|
Chris@13
|
3
|
Chris@13
|
4 The most common way to work with the AST is by using a node traverser and one or more node visitors.
|
Chris@13
|
5 As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
|
Chris@13
|
6 `42` becomes `'42'`.)
|
Chris@13
|
7
|
Chris@13
|
8 ```php
|
Chris@13
|
9 use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
|
Chris@13
|
10
|
Chris@13
|
11 $traverser = new NodeTraverser;
|
Chris@13
|
12 $traverser->addVisitor(new class extends NodeVisitorAbstract {
|
Chris@13
|
13 public function leaveNode(Node $node) {
|
Chris@13
|
14 if ($node instanceof Node\Scalar\LNumber) {
|
Chris@13
|
15 return new Node\Scalar\String_((string) $node->value);
|
Chris@13
|
16 }
|
Chris@13
|
17 }
|
Chris@13
|
18 });
|
Chris@13
|
19
|
Chris@13
|
20 $stmts = ...;
|
Chris@13
|
21 $modifiedStmts = $traverser->traverse($stmts);
|
Chris@13
|
22 ```
|
Chris@13
|
23
|
Chris@13
|
24 Node visitors
|
Chris@13
|
25 -------------
|
Chris@13
|
26
|
Chris@13
|
27 Each node visitor implements an interface with following four methods:
|
Chris@13
|
28
|
Chris@13
|
29 ```php
|
Chris@13
|
30 interface NodeVisitor {
|
Chris@13
|
31 public function beforeTraverse(array $nodes);
|
Chris@13
|
32 public function enterNode(Node $node);
|
Chris@13
|
33 public function leaveNode(Node $node);
|
Chris@13
|
34 public function afterTraverse(array $nodes);
|
Chris@13
|
35 }
|
Chris@13
|
36 ```
|
Chris@13
|
37
|
Chris@13
|
38 The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
|
Chris@13
|
39 respectively, and are passed the entire AST. They can be used to perform any necessary state
|
Chris@13
|
40 setup or cleanup.
|
Chris@13
|
41
|
Chris@13
|
42 The `enterNode()` method is called when a node is first encountered, before its children are
|
Chris@13
|
43 processed ("preorder"). The `leaveNode()` method is called after all children have been visited
|
Chris@13
|
44 ("postorder").
|
Chris@13
|
45
|
Chris@13
|
46 For example, if we have the following excerpt of an AST
|
Chris@13
|
47
|
Chris@13
|
48 ```
|
Chris@13
|
49 Expr_FuncCall(
|
Chris@13
|
50 name: Name(
|
Chris@13
|
51 parts: array(
|
Chris@13
|
52 0: printLine
|
Chris@13
|
53 )
|
Chris@13
|
54 )
|
Chris@13
|
55 args: array(
|
Chris@13
|
56 0: Arg(
|
Chris@13
|
57 value: Scalar_String(
|
Chris@13
|
58 value: Hello World!!!
|
Chris@13
|
59 )
|
Chris@13
|
60 byRef: false
|
Chris@13
|
61 unpack: false
|
Chris@13
|
62 )
|
Chris@13
|
63 )
|
Chris@13
|
64 )
|
Chris@13
|
65 ```
|
Chris@13
|
66
|
Chris@13
|
67 then the enter/leave methods will be called in the following order:
|
Chris@13
|
68
|
Chris@13
|
69 ```
|
Chris@13
|
70 enterNode(Expr_FuncCall)
|
Chris@13
|
71 enterNode(Name)
|
Chris@13
|
72 leaveNode(Name)
|
Chris@13
|
73 enterNode(Arg)
|
Chris@13
|
74 enterNode(Scalar_String)
|
Chris@13
|
75 leaveNode(Scalar_String)
|
Chris@13
|
76 leaveNode(Arg)
|
Chris@13
|
77 leaveNode(Expr_FuncCall)
|
Chris@13
|
78 ```
|
Chris@13
|
79
|
Chris@13
|
80 A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
|
Chris@13
|
81 performs modifications based on that. At the time when `leaveNode` is called, all the code inside
|
Chris@13
|
82 the node will have already been visited and necessary information collected.
|
Chris@13
|
83
|
Chris@13
|
84 As you usually do not want to implement all four methods, it is recommended that you extend
|
Chris@13
|
85 `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
|
Chris@13
|
86 empty default implementations.
|
Chris@13
|
87
|
Chris@13
|
88 Modifying the AST
|
Chris@13
|
89 -----------------
|
Chris@13
|
90
|
Chris@13
|
91 There are a number of ways in which the AST can be modified from inside a node visitor. The first
|
Chris@13
|
92 and simplest is to simply change AST properties inside the visitor:
|
Chris@13
|
93
|
Chris@13
|
94 ```php
|
Chris@13
|
95 public function leaveNode(Node $node) {
|
Chris@13
|
96 if ($node instanceof Node\Scalar\LNumber) {
|
Chris@13
|
97 // increment all integer literals
|
Chris@13
|
98 $node->value++;
|
Chris@13
|
99 }
|
Chris@13
|
100 }
|
Chris@13
|
101 ```
|
Chris@13
|
102
|
Chris@13
|
103 The second is to replace a node entirely by returning a new node:
|
Chris@13
|
104
|
Chris@13
|
105 ```php
|
Chris@13
|
106 public function leaveNode(Node $node) {
|
Chris@13
|
107 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
|
Chris@13
|
108 // Convert all $a && $b expressions into !($a && $b)
|
Chris@13
|
109 return new Node\Expr\BooleanNot($node);
|
Chris@13
|
110 }
|
Chris@13
|
111 }
|
Chris@13
|
112 ```
|
Chris@13
|
113
|
Chris@13
|
114 Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
|
Chris@13
|
115 where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
|
Chris@13
|
116 will also consider the children of the new node. If you aren't careful, this can lead to infinite
|
Chris@13
|
117 recursion. For example, let's take the previous code sample and use enterNode instead:
|
Chris@13
|
118
|
Chris@13
|
119 ```php
|
Chris@13
|
120 public function enterNode(Node $node) {
|
Chris@13
|
121 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
|
Chris@13
|
122 // Convert all $a && $b expressions into !($a && $b)
|
Chris@13
|
123 return new Node\Expr\BooleanNot($node);
|
Chris@13
|
124 }
|
Chris@13
|
125 }
|
Chris@13
|
126 ```
|
Chris@13
|
127
|
Chris@13
|
128 Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
|
Chris@13
|
129 only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
|
Chris@13
|
130 with `!!($a && $b)`. This will continue until PHP hits the memory limit.
|
Chris@13
|
131
|
Chris@13
|
132 Finally, two special replacement types are supported only by leaveNode. The first is removal of a
|
Chris@13
|
133 node:
|
Chris@13
|
134
|
Chris@13
|
135 ```php
|
Chris@13
|
136 public function leaveNode(Node $node) {
|
Chris@13
|
137 if ($node instanceof Node\Stmt\Return_) {
|
Chris@13
|
138 // Remove all return statements
|
Chris@13
|
139 return NodeTraverser::REMOVE_NODE;
|
Chris@13
|
140 }
|
Chris@13
|
141 }
|
Chris@13
|
142 ```
|
Chris@13
|
143
|
Chris@13
|
144 Node removal only works if the parent structure is an array. This means that usually it only makes
|
Chris@13
|
145 sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
|
Chris@13
|
146 more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
|
Chris@13
|
147
|
Chris@13
|
148 On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
|
Chris@13
|
149 meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
|
Chris@13
|
150 generally want to remove it together with a surrounding expression statement:
|
Chris@13
|
151
|
Chris@13
|
152 ```php
|
Chris@13
|
153 public function leaveNode(Node $node) {
|
Chris@13
|
154 if ($node instanceof Node\Stmt\Expression
|
Chris@13
|
155 && $node->expr instanceof Node\Expr\FuncCall
|
Chris@13
|
156 && $node->expr->name instanceof Node\Name
|
Chris@13
|
157 && $node->expr->name->toString() === 'var_dump'
|
Chris@13
|
158 ) {
|
Chris@13
|
159 return NodeTraverser::REMOVE_NODE;
|
Chris@13
|
160 }
|
Chris@13
|
161 }
|
Chris@13
|
162 ```
|
Chris@13
|
163
|
Chris@13
|
164 This example will remove all calls to `var_dump()` which occur as expression statements. This means
|
Chris@13
|
165 that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
|
Chris@13
|
166 obvious way in which it can be removed).
|
Chris@13
|
167
|
Chris@13
|
168 Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
|
Chris@13
|
169 only works inside leaveNode and only if the parent structure is an array.
|
Chris@13
|
170
|
Chris@13
|
171 ```php
|
Chris@13
|
172 public function leaveNode(Node $node) {
|
Chris@13
|
173 if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
|
Chris@13
|
174 // Convert "return foo();" into "$retval = foo(); return $retval;"
|
Chris@13
|
175 $var = new Node\Expr\Variable('retval');
|
Chris@13
|
176 return [
|
Chris@13
|
177 new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
|
Chris@13
|
178 new Node\Stmt\Return_($var),
|
Chris@13
|
179 ];
|
Chris@13
|
180 }
|
Chris@13
|
181 }
|
Chris@13
|
182 ```
|
Chris@13
|
183
|
Chris@13
|
184 Short-circuiting traversal
|
Chris@13
|
185 --------------------------
|
Chris@13
|
186
|
Chris@13
|
187 An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
|
Chris@13
|
188 especially if you have more than one visitor. In some cases, it is possible to avoid a full
|
Chris@13
|
189 traversal.
|
Chris@13
|
190
|
Chris@13
|
191 If you are looking for all class declarations in a file (and assuming you're not interested in
|
Chris@13
|
192 anonymous classes), you know that once you've seen a class declaration, there is no point in also
|
Chris@13
|
193 checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
|
Chris@13
|
194 instruct the traverser to not recurse into the class node:
|
Chris@13
|
195
|
Chris@13
|
196 ```
|
Chris@13
|
197 private $classes = [];
|
Chris@13
|
198 public function enterNode(Node $node) {
|
Chris@13
|
199 if ($node instanceof Node\Stmt\Class_) {
|
Chris@13
|
200 $this->classes[] = $node;
|
Chris@13
|
201 return NodeTraverser::DONT_TRAVERSE_CHILDREN;
|
Chris@13
|
202 }
|
Chris@13
|
203 }
|
Chris@13
|
204 ```
|
Chris@13
|
205
|
Chris@13
|
206 Of course, this option is only available in enterNode, because it's already too late by the time
|
Chris@13
|
207 leaveNode is reached.
|
Chris@13
|
208
|
Chris@13
|
209 If you are only looking for one specific node, it is also possible to abort the traversal entirely
|
Chris@13
|
210 after finding it. For example, if you are looking for the node of a class with a certain name (and
|
Chris@13
|
211 discounting exotic cases like conditionally defining a class two times), you can stop traversal
|
Chris@13
|
212 once you found it:
|
Chris@13
|
213
|
Chris@13
|
214 ```
|
Chris@13
|
215 private $class = null;
|
Chris@13
|
216 public function enterNode(Node $node) {
|
Chris@13
|
217 if ($node instanceof Node\Stmt\Class_ &&
|
Chris@17
|
218 $node->namespacedName->toString() === 'Foo\Bar\Baz'
|
Chris@13
|
219 ) {
|
Chris@13
|
220 $this->class = $node;
|
Chris@13
|
221 return NodeTraverser::STOP_TRAVERSAL;
|
Chris@13
|
222 }
|
Chris@13
|
223 }
|
Chris@13
|
224 ```
|
Chris@13
|
225
|
Chris@13
|
226 This works both in enterNode and leaveNode. Note that this particular case can also be more easily
|
Chris@13
|
227 handled using a NodeFinder, which will be introduced below.
|
Chris@13
|
228
|
Chris@13
|
229 Multiple visitors
|
Chris@13
|
230 -----------------
|
Chris@13
|
231
|
Chris@13
|
232 A single traverser can be used with multiple visitors:
|
Chris@13
|
233
|
Chris@13
|
234 ```php
|
Chris@13
|
235 $traverser = new NodeTraverser;
|
Chris@13
|
236 $traverser->addVisitor($visitorA);
|
Chris@13
|
237 $traverser->addVisitor($visitorB);
|
Chris@17
|
238 $stmts = $traverser->traverse($stmts);
|
Chris@13
|
239 ```
|
Chris@13
|
240
|
Chris@13
|
241 It is important to understand that if a traverser is run with multiple visitors, the visitors will
|
Chris@13
|
242 be interleaved. Given the following AST excerpt
|
Chris@13
|
243
|
Chris@13
|
244 ```
|
Chris@13
|
245 Stmt_Return(
|
Chris@13
|
246 expr: Expr_Variable(
|
Chris@13
|
247 name: foobar
|
Chris@13
|
248 )
|
Chris@13
|
249 )
|
Chris@13
|
250 ```
|
Chris@13
|
251
|
Chris@13
|
252 the following method calls will be performed:
|
Chris@13
|
253
|
Chris@13
|
254 ```
|
Chris@13
|
255 $visitorA->enterNode(Stmt_Return)
|
Chris@13
|
256 $visitorB->enterNode(Stmt_Return)
|
Chris@13
|
257 $visitorA->enterNode(Expr_Variable)
|
Chris@13
|
258 $visitorB->enterNode(Expr_Variable)
|
Chris@13
|
259 $visitorA->leaveNode(Expr_Variable)
|
Chris@13
|
260 $visitorB->leaveNode(Expr_Variable)
|
Chris@13
|
261 $visitorA->leaveNode(Stmt_Return)
|
Chris@13
|
262 $visitorB->leaveNode(Stmt_Return)
|
Chris@13
|
263 ```
|
Chris@13
|
264
|
Chris@13
|
265 That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
|
Chris@13
|
266 Running multiple visitors in parallel improves performance, as the AST only has to be traversed
|
Chris@13
|
267 once. However, it is not always possible to write visitors in a way that allows interleaved
|
Chris@13
|
268 execution. In this case, you can always fall back to performing multiple traversals:
|
Chris@13
|
269
|
Chris@13
|
270 ```php
|
Chris@13
|
271 $traverserA = new NodeTraverser;
|
Chris@13
|
272 $traverserA->addVisitor($visitorA);
|
Chris@13
|
273 $traverserB = new NodeTraverser;
|
Chris@13
|
274 $traverserB->addVisitor($visitorB);
|
Chris@13
|
275 $stmts = $traverserA->traverser($stmts);
|
Chris@13
|
276 $stmts = $traverserB->traverser($stmts);
|
Chris@13
|
277 ```
|
Chris@13
|
278
|
Chris@13
|
279 When using multiple visitors, it is important to understand how they interact with the various
|
Chris@13
|
280 special enterNode/leaveNode return values:
|
Chris@13
|
281
|
Chris@13
|
282 * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
|
Chris@13
|
283 visitors.
|
Chris@17
|
284 * If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all*
|
Chris@17
|
285 visitors, and all *subsequent* visitors will not visit the current node.
|
Chris@13
|
286 * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
|
Chris@13
|
287 * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
|
Chris@13
|
288 not the original one.
|
Chris@13
|
289 * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
|
Chris@13
|
290 * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
|
Chris@13
|
291 that was replaced, nor the replacement nodes.
|
Chris@13
|
292
|
Chris@13
|
293 Simple node finding
|
Chris@13
|
294 -------------------
|
Chris@13
|
295
|
Chris@13
|
296 While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
|
Chris@13
|
297 for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
|
Chris@13
|
298 satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
|
Chris@13
|
299 shown in the following:
|
Chris@13
|
300
|
Chris@13
|
301 ```php
|
Chris@13
|
302 use PhpParser\{Node, NodeFinder};
|
Chris@13
|
303
|
Chris@13
|
304 $nodeFinder = new NodeFinder;
|
Chris@13
|
305
|
Chris@13
|
306 // Find all class nodes.
|
Chris@13
|
307 $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
|
Chris@13
|
308
|
Chris@13
|
309 // Find all classes that extend another class
|
Chris@17
|
310 $extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
|
Chris@13
|
311 return $node instanceof Node\Stmt\Class_
|
Chris@13
|
312 && $node->extends !== null;
|
Chris@13
|
313 });
|
Chris@13
|
314
|
Chris@13
|
315 // Find first class occuring in the AST. Returns null if no class exists.
|
Chris@13
|
316 $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
|
Chris@13
|
317
|
Chris@13
|
318 // Find first class that has name $name
|
Chris@13
|
319 $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
|
Chris@13
|
320 return $node instanceof Node\Stmt\Class_
|
Chris@13
|
321 && $node->resolvedName->toString() === $name;
|
Chris@13
|
322 });
|
Chris@13
|
323 ```
|
Chris@13
|
324
|
Chris@13
|
325 Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
|
Chris@13
|
326 common use case.
|
Chris@13
|
327
|
Chris@13
|
328 Parent and sibling references
|
Chris@13
|
329 -----------------------------
|
Chris@13
|
330
|
Chris@13
|
331 The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
|
Chris@13
|
332 be accessed: From parents to children. However, it can often be convenient to operate in the
|
Chris@13
|
333 reverse direction: When working on a node, you might want to check if the parent node satisfies a
|
Chris@13
|
334 certain property.
|
Chris@13
|
335
|
Chris@13
|
336 PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
|
Chris@17
|
337 emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.
|