Mercurial > hg > isophonics-drupal-site
comparison vendor/nikic/php-parser/doc/component/Lexer.markdown @ 0:4c8ae668cc8c
Initial import (non-working)
author | Chris Cannam |
---|---|
date | Wed, 29 Nov 2017 16:09:58 +0000 |
parents | |
children | 5fb285c0d0e3 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4c8ae668cc8c |
---|---|
1 Lexer component documentation | |
2 ============================= | |
3 | |
4 The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and | |
5 `PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of | |
6 newer PHP versions and thus allows parsing of new code on older versions. | |
7 | |
8 This documentation discusses options available for the default lexers and explains how lexers can be extended. | |
9 | |
10 Lexer options | |
11 ------------- | |
12 | |
13 The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is | |
14 supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be | |
15 accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()` | |
16 methods. A sample options array: | |
17 | |
18 ```php | |
19 $lexer = new PhpParser\Lexer(array( | |
20 'usedAttributes' => array( | |
21 'comments', 'startLine', 'endLine' | |
22 ) | |
23 )); | |
24 ``` | |
25 | |
26 The attributes used in this example match the default behavior of the lexer. The following attributes are supported: | |
27 | |
28 * `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred | |
29 between the previous non-discarded token and the current one. Use of this attribute is required for the | |
30 `$node->getDocComment()` method to work. The attribute is also needed if you wish the pretty printer to retain | |
31 comments present in the original code. | |
32 * `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also | |
33 required if syntax errors should contain line number information. | |
34 * `endLine`: Line in which the node ends. | |
35 * `startTokenPos`: Offset into the token array of the first token in the node. | |
36 * `endTokenPos`: Offset into the token array of the last token in the node. | |
37 * `startFilePos`: Offset into the code string of the first character that is part of the node. | |
38 * `endFilePos`: Offset into the code string of the last character that is part of the node. | |
39 | |
40 ### Using token positions | |
41 | |
42 The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST | |
43 does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this | |
44 information based on the token position: | |
45 | |
46 ```php | |
47 function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) { | |
48 $i = $prop->getAttribute('startTokenPos'); | |
49 return $tokens[$i][0] === T_VAR; | |
50 } | |
51 ``` | |
52 | |
53 In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using | |
54 code similar to the following: | |
55 | |
56 ```php | |
57 class MyNodeVisitor extends PhpParser\NodeVisitorAbstract { | |
58 private $tokens; | |
59 public function setTokens(array $tokens) { | |
60 $this->tokens = $tokens; | |
61 } | |
62 | |
63 public function leaveNode(PhpParser\Node $node) { | |
64 if ($node instanceof PhpParser\Node\Stmt\Property) { | |
65 var_dump(isDeclaredUsingVar($this->tokens, $node)); | |
66 } | |
67 } | |
68 } | |
69 | |
70 $lexer = new PhpParser\Lexer(array( | |
71 'usedAttributes' => array( | |
72 'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos' | |
73 ) | |
74 )); | |
75 $parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer); | |
76 | |
77 $visitor = new MyNodeVisitor(); | |
78 $traverser = new PhpParser\NodeTraverser(); | |
79 $traverser->addVisitor($visitor); | |
80 | |
81 try { | |
82 $stmts = $parser->parse($code); | |
83 $visitor->setTokens($lexer->getTokens()); | |
84 $stmts = $traverser->traverse($stmts); | |
85 } catch (PhpParser\Error $e) { | |
86 echo 'Parse Error: ', $e->getMessage(); | |
87 } | |
88 ``` | |
89 | |
90 The same approach can also be used to perform specific modifications in the code, without changing the formatting in | |
91 other places (which is the case when using the pretty printer). | |
92 | |
93 Lexer extension | |
94 --------------- | |
95 | |
96 A lexer has to define the following public interface: | |
97 | |
98 void startLexing(string $code, ErrorHandler $errorHandler = null); | |
99 array getTokens(); | |
100 string handleHaltCompiler(); | |
101 int getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null); | |
102 | |
103 The `startLexing()` method is invoked with the source code that is to be lexed (including the opening tag) whenever the | |
104 `parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. The | |
105 passes `ErrorHandler` should be used to report lexing errors. | |
106 | |
107 The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not | |
108 used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes. | |
109 | |
110 The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the | |
111 remaining string after the construct (not including `();`). | |
112 | |
113 The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more | |
114 tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the | |
115 token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser). | |
116 | |
117 ### Attribute handling | |
118 | |
119 The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be | |
120 assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the | |
121 node and the `$endAttributes` from the last token that is part of the node. | |
122 | |
123 E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the | |
124 `T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token. | |
125 | |
126 An application of custom attributes is storing the exact original formatting of literals: While the parser does retain | |
127 some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it | |
128 does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This | |
129 can be remedied by storing the original value in an attribute: | |
130 | |
131 ```php | |
132 use PhpParser\Lexer; | |
133 use PhpParser\Parser\Tokens; | |
134 | |
135 class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative | |
136 { | |
137 public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) { | |
138 $tokenId = parent::getNextToken($value, $startAttributes, $endAttributes); | |
139 | |
140 if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string | |
141 || $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string | |
142 || $tokenId == Tokens::T_LNUMBER // integer | |
143 || $tokenId == Tokens::T_DNUMBER // floating point number | |
144 ) { | |
145 // could also use $startAttributes, doesn't really matter here | |
146 $endAttributes['originalValue'] = $value; | |
147 } | |
148 | |
149 return $tokenId; | |
150 } | |
151 } | |
152 ``` |