Mercurial > hg > cmmr2012-drupal-site
comparison vendor/nikic/php-parser/doc/component/Walking_the_AST.markdown @ 0:c75dbcec494b
Initial commit from drush-created site
author | Chris Cannam |
---|---|
date | Thu, 05 Jul 2018 14:24:15 +0000 |
parents | |
children | a9cd425dd02b |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:c75dbcec494b |
---|---|
1 Walking the AST | |
2 =============== | |
3 | |
4 The most common way to work with the AST is by using a node traverser and one or more node visitors. | |
5 As a basic example, the following code changes all literal integers in the AST into strings (e.g., | |
6 `42` becomes `'42'`.) | |
7 | |
8 ```php | |
9 use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract}; | |
10 | |
11 $traverser = new NodeTraverser; | |
12 $traverser->addVisitor(new class extends NodeVisitorAbstract { | |
13 public function leaveNode(Node $node) { | |
14 if ($node instanceof Node\Scalar\LNumber) { | |
15 return new Node\Scalar\String_((string) $node->value); | |
16 } | |
17 } | |
18 }); | |
19 | |
20 $stmts = ...; | |
21 $modifiedStmts = $traverser->traverse($stmts); | |
22 ``` | |
23 | |
24 Node visitors | |
25 ------------- | |
26 | |
27 Each node visitor implements an interface with following four methods: | |
28 | |
29 ```php | |
30 interface NodeVisitor { | |
31 public function beforeTraverse(array $nodes); | |
32 public function enterNode(Node $node); | |
33 public function leaveNode(Node $node); | |
34 public function afterTraverse(array $nodes); | |
35 } | |
36 ``` | |
37 | |
38 The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal | |
39 respectively, and are passed the entire AST. They can be used to perform any necessary state | |
40 setup or cleanup. | |
41 | |
42 The `enterNode()` method is called when a node is first encountered, before its children are | |
43 processed ("preorder"). The `leaveNode()` method is called after all children have been visited | |
44 ("postorder"). | |
45 | |
46 For example, if we have the following excerpt of an AST | |
47 | |
48 ``` | |
49 Expr_FuncCall( | |
50 name: Name( | |
51 parts: array( | |
52 0: printLine | |
53 ) | |
54 ) | |
55 args: array( | |
56 0: Arg( | |
57 value: Scalar_String( | |
58 value: Hello World!!! | |
59 ) | |
60 byRef: false | |
61 unpack: false | |
62 ) | |
63 ) | |
64 ) | |
65 ``` | |
66 | |
67 then the enter/leave methods will be called in the following order: | |
68 | |
69 ``` | |
70 enterNode(Expr_FuncCall) | |
71 enterNode(Name) | |
72 leaveNode(Name) | |
73 enterNode(Arg) | |
74 enterNode(Scalar_String) | |
75 leaveNode(Scalar_String) | |
76 leaveNode(Arg) | |
77 leaveNode(Expr_FuncCall) | |
78 ``` | |
79 | |
80 A common pattern is that `enterNode` is used to collect some information and then `leaveNode` | |
81 performs modifications based on that. At the time when `leaveNode` is called, all the code inside | |
82 the node will have already been visited and necessary information collected. | |
83 | |
84 As you usually do not want to implement all four methods, it is recommended that you extend | |
85 `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides | |
86 empty default implementations. | |
87 | |
88 Modifying the AST | |
89 ----------------- | |
90 | |
91 There are a number of ways in which the AST can be modified from inside a node visitor. The first | |
92 and simplest is to simply change AST properties inside the visitor: | |
93 | |
94 ```php | |
95 public function leaveNode(Node $node) { | |
96 if ($node instanceof Node\Scalar\LNumber) { | |
97 // increment all integer literals | |
98 $node->value++; | |
99 } | |
100 } | |
101 ``` | |
102 | |
103 The second is to replace a node entirely by returning a new node: | |
104 | |
105 ```php | |
106 public function leaveNode(Node $node) { | |
107 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) { | |
108 // Convert all $a && $b expressions into !($a && $b) | |
109 return new Node\Expr\BooleanNot($node); | |
110 } | |
111 } | |
112 ``` | |
113 | |
114 Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about | |
115 where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal | |
116 will also consider the children of the new node. If you aren't careful, this can lead to infinite | |
117 recursion. For example, let's take the previous code sample and use enterNode instead: | |
118 | |
119 ```php | |
120 public function enterNode(Node $node) { | |
121 if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) { | |
122 // Convert all $a && $b expressions into !($a && $b) | |
123 return new Node\Expr\BooleanNot($node); | |
124 } | |
125 } | |
126 ``` | |
127 | |
128 Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and | |
129 only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up | |
130 with `!!($a && $b)`. This will continue until PHP hits the memory limit. | |
131 | |
132 Finally, two special replacement types are supported only by leaveNode. The first is removal of a | |
133 node: | |
134 | |
135 ```php | |
136 public function leaveNode(Node $node) { | |
137 if ($node instanceof Node\Stmt\Return_) { | |
138 // Remove all return statements | |
139 return NodeTraverser::REMOVE_NODE; | |
140 } | |
141 } | |
142 ``` | |
143 | |
144 Node removal only works if the parent structure is an array. This means that usually it only makes | |
145 sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few | |
146 more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists). | |
147 | |
148 On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no | |
149 meaningful way in which the `$a` part could be removed. If you want to remove an expression, you | |
150 generally want to remove it together with a surrounding expression statement: | |
151 | |
152 ```php | |
153 public function leaveNode(Node $node) { | |
154 if ($node instanceof Node\Stmt\Expression | |
155 && $node->expr instanceof Node\Expr\FuncCall | |
156 && $node->expr->name instanceof Node\Name | |
157 && $node->expr->name->toString() === 'var_dump' | |
158 ) { | |
159 return NodeTraverser::REMOVE_NODE; | |
160 } | |
161 } | |
162 ``` | |
163 | |
164 This example will remove all calls to `var_dump()` which occur as expression statements. This means | |
165 that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no | |
166 obvious way in which it can be removed). | |
167 | |
168 Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this | |
169 only works inside leaveNode and only if the parent structure is an array. | |
170 | |
171 ```php | |
172 public function leaveNode(Node $node) { | |
173 if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) { | |
174 // Convert "return foo();" into "$retval = foo(); return $retval;" | |
175 $var = new Node\Expr\Variable('retval'); | |
176 return [ | |
177 new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)), | |
178 new Node\Stmt\Return_($var), | |
179 ]; | |
180 } | |
181 } | |
182 ``` | |
183 | |
184 Short-circuiting traversal | |
185 -------------------------- | |
186 | |
187 An AST can easily contain thousands of nodes, and traversing over all of them may be slow, | |
188 especially if you have more than one visitor. In some cases, it is possible to avoid a full | |
189 traversal. | |
190 | |
191 If you are looking for all class declarations in a file (and assuming you're not interested in | |
192 anonymous classes), you know that once you've seen a class declaration, there is no point in also | |
193 checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can | |
194 instruct the traverser to not recurse into the class node: | |
195 | |
196 ``` | |
197 private $classes = []; | |
198 public function enterNode(Node $node) { | |
199 if ($node instanceof Node\Stmt\Class_) { | |
200 $this->classes[] = $node; | |
201 return NodeTraverser::DONT_TRAVERSE_CHILDREN; | |
202 } | |
203 } | |
204 ``` | |
205 | |
206 Of course, this option is only available in enterNode, because it's already too late by the time | |
207 leaveNode is reached. | |
208 | |
209 If you are only looking for one specific node, it is also possible to abort the traversal entirely | |
210 after finding it. For example, if you are looking for the node of a class with a certain name (and | |
211 discounting exotic cases like conditionally defining a class two times), you can stop traversal | |
212 once you found it: | |
213 | |
214 ``` | |
215 private $class = null; | |
216 public function enterNode(Node $node) { | |
217 if ($node instanceof Node\Stmt\Class_ && | |
218 $node->namespaceName->toString() === 'Foo\Bar\Baz' | |
219 ) { | |
220 $this->class = $node; | |
221 return NodeTraverser::STOP_TRAVERSAL; | |
222 } | |
223 } | |
224 ``` | |
225 | |
226 This works both in enterNode and leaveNode. Note that this particular case can also be more easily | |
227 handled using a NodeFinder, which will be introduced below. | |
228 | |
229 Multiple visitors | |
230 ----------------- | |
231 | |
232 A single traverser can be used with multiple visitors: | |
233 | |
234 ```php | |
235 $traverser = new NodeTraverser; | |
236 $traverser->addVisitor($visitorA); | |
237 $traverser->addVisitor($visitorB); | |
238 $stmts = $traverser->traverser($stmts); | |
239 ``` | |
240 | |
241 It is important to understand that if a traverser is run with multiple visitors, the visitors will | |
242 be interleaved. Given the following AST excerpt | |
243 | |
244 ``` | |
245 Stmt_Return( | |
246 expr: Expr_Variable( | |
247 name: foobar | |
248 ) | |
249 ) | |
250 ``` | |
251 | |
252 the following method calls will be performed: | |
253 | |
254 ``` | |
255 $visitorA->enterNode(Stmt_Return) | |
256 $visitorB->enterNode(Stmt_Return) | |
257 $visitorA->enterNode(Expr_Variable) | |
258 $visitorB->enterNode(Expr_Variable) | |
259 $visitorA->leaveNode(Expr_Variable) | |
260 $visitorB->leaveNode(Expr_Variable) | |
261 $visitorA->leaveNode(Stmt_Return) | |
262 $visitorB->leaveNode(Stmt_Return) | |
263 ``` | |
264 | |
265 That is, when visiting a node, enterNode and leaveNode will always be called for all visitors. | |
266 Running multiple visitors in parallel improves performance, as the AST only has to be traversed | |
267 once. However, it is not always possible to write visitors in a way that allows interleaved | |
268 execution. In this case, you can always fall back to performing multiple traversals: | |
269 | |
270 ```php | |
271 $traverserA = new NodeTraverser; | |
272 $traverserA->addVisitor($visitorA); | |
273 $traverserB = new NodeTraverser; | |
274 $traverserB->addVisitor($visitorB); | |
275 $stmts = $traverserA->traverser($stmts); | |
276 $stmts = $traverserB->traverser($stmts); | |
277 ``` | |
278 | |
279 When using multiple visitors, it is important to understand how they interact with the various | |
280 special enterNode/leaveNode return values: | |
281 | |
282 * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all* | |
283 visitors. | |
284 * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors. | |
285 * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node, | |
286 not the original one. | |
287 * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node. | |
288 * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node | |
289 that was replaced, nor the replacement nodes. | |
290 | |
291 Simple node finding | |
292 ------------------- | |
293 | |
294 While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome | |
295 for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either | |
296 satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are | |
297 shown in the following: | |
298 | |
299 ```php | |
300 use PhpParser\{Node, NodeFinder}; | |
301 | |
302 $nodeFinder = new NodeFinder; | |
303 | |
304 // Find all class nodes. | |
305 $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class); | |
306 | |
307 // Find all classes that extend another class | |
308 $extendingClasses = $nodeFinder->findInstanceOf($stmts, function(Node $node) { | |
309 return $node instanceof Node\Stmt\Class_ | |
310 && $node->extends !== null; | |
311 }); | |
312 | |
313 // Find first class occuring in the AST. Returns null if no class exists. | |
314 $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class); | |
315 | |
316 // Find first class that has name $name | |
317 $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) { | |
318 return $node instanceof Node\Stmt\Class_ | |
319 && $node->resolvedName->toString() === $name; | |
320 }); | |
321 ``` | |
322 | |
323 Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a | |
324 common use case. | |
325 | |
326 Parent and sibling references | |
327 ----------------------------- | |
328 | |
329 The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should | |
330 be accessed: From parents to children. However, it can often be convenient to operate in the | |
331 reverse direction: When working on a node, you might want to check if the parent node satisfies a | |
332 certain property. | |
333 | |
334 PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily | |
335 emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information. |