isophonics-drupal-site: vendor/nikic/php-parser/doc/component/Lexer.markdown annotate

annotate vendor/nikic/php-parser/doc/component/Lexer.markdown @ 19:fa3358dc1485 tip

Add ndrum files

author	Chris Cannam
date	Wed, 28 Aug 2019 13:14:47 +0100
parents	5fb285c0d0e3
children

rev	line source
Chris@0	1 Lexer component documentation
Chris@0	2 =============================
Chris@0	3
Chris@0	4 The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and
Chris@0	5 `PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of
Chris@0	6 newer PHP versions and thus allows parsing of new code on older versions.
Chris@0	7
Chris@0	8 This documentation discusses options available for the default lexers and explains how lexers can be extended.
Chris@0	9
Chris@0	10 Lexer options
Chris@0	11 -------------
Chris@0	12
Chris@0	13 The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is
Chris@0	14 supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be
Chris@0	15 accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()`
Chris@0	16 methods. A sample options array:
Chris@0	17
Chris@0	18 ```php
Chris@0	19 $lexer = new PhpParser\Lexer(array(
Chris@0	20 'usedAttributes' => array(
Chris@0	21 'comments', 'startLine', 'endLine'
Chris@0	22 )
Chris@0	23 ));
Chris@0	24 ```
Chris@0	25
Chris@0	26 The attributes used in this example match the default behavior of the lexer. The following attributes are supported:
Chris@0	27
Chris@0	28 * `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred
Chris@0	29 between the previous non-discarded token and the current one. Use of this attribute is required for the
Chris@13	30 `$node->getComments()` and `$node->getDocComment()` methods to work. The attribute is also needed if you wish the pretty
Chris@13	31 printer to retain comments present in the original code.
Chris@0	32 * `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also
Chris@0	33 required if syntax errors should contain line number information.
Chris@13	34 * `endLine`: Line in which the node ends. Required for `$node->getEndLine()`.
Chris@13	35 * `startTokenPos`: Offset into the token array of the first token in the node. Required for `$node->getStartTokenPos()`.
Chris@13	36 * `endTokenPos`: Offset into the token array of the last token in the node. Required for `$node->getEndTokenPos()`.
Chris@13	37 * `startFilePos`: Offset into the code string of the first character that is part of the node. Required for `$node->getStartFilePos()`.
Chris@13	38 * `endFilePos`: Offset into the code string of the last character that is part of the node. Required for `$node->getEndFilePos()`.
Chris@0	39
Chris@0	40 ### Using token positions
Chris@0	41
Chris@13	42 > Note: The example in this section is outdated in that this information is directly available in the AST: While
Chris@13	43 > `$property->isPublic()` does not distinguish between `public` and `var`, directly checking `$property->flags` for
Chris@13	44 > the `$property->flags & Class_::VISIBILITY_MODIFIER_MASK) === 0` allows making this distinction without resorting to
Chris@13	45 > tokens. However the general idea behind the example still applies in other cases.
Chris@13	46
Chris@0	47 The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST
Chris@0	48 does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this
Chris@0	49 information based on the token position:
Chris@0	50
Chris@0	51 ```php
Chris@0	52 function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) {
Chris@0	53 $i = $prop->getAttribute('startTokenPos');
Chris@0	54 return $tokens[$i][0] === T_VAR;
Chris@0	55 }
Chris@0	56 ```
Chris@0	57
Chris@0	58 In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using
Chris@0	59 code similar to the following:
Chris@0	60
Chris@0	61 ```php
Chris@0	62 class MyNodeVisitor extends PhpParser\NodeVisitorAbstract {
Chris@0	63 private $tokens;
Chris@0	64 public function setTokens(array $tokens) {
Chris@0	65 $this->tokens = $tokens;
Chris@0	66 }
Chris@0	67
Chris@0	68 public function leaveNode(PhpParser\Node $node) {
Chris@0	69 if ($node instanceof PhpParser\Node\Stmt\Property) {
Chris@0	70 var_dump(isDeclaredUsingVar($this->tokens, $node));
Chris@0	71 }
Chris@0	72 }
Chris@0	73 }
Chris@0	74
Chris@0	75 $lexer = new PhpParser\Lexer(array(
Chris@0	76 'usedAttributes' => array(
Chris@0	77 'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos'
Chris@0	78 )
Chris@0	79 ));
Chris@13	80 $parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7, $lexer);
Chris@0	81
Chris@0	82 $visitor = new MyNodeVisitor();
Chris@0	83 $traverser = new PhpParser\NodeTraverser();
Chris@0	84 $traverser->addVisitor($visitor);
Chris@0	85
Chris@0	86 try {
Chris@0	87 $stmts = $parser->parse($code);
Chris@0	88 $visitor->setTokens($lexer->getTokens());
Chris@0	89 $stmts = $traverser->traverse($stmts);
Chris@0	90 } catch (PhpParser\Error $e) {
Chris@0	91 echo 'Parse Error: ', $e->getMessage();
Chris@0	92 }
Chris@0	93 ```
Chris@0	94
Chris@0	95 The same approach can also be used to perform specific modifications in the code, without changing the formatting in
Chris@0	96 other places (which is the case when using the pretty printer).
Chris@0	97
Chris@0	98 Lexer extension
Chris@0	99 ---------------
Chris@0	100
Chris@0	101 A lexer has to define the following public interface:
Chris@0	102
Chris@13	103 ```php
Chris@13	104 function startLexing(string $code, ErrorHandler $errorHandler = null): void;
Chris@13	105 function getTokens(): array;
Chris@13	106 function handleHaltCompiler(): string;
Chris@13	107 function getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null): int;
Chris@13	108 ```
Chris@0	109
Chris@13	110 The `startLexing()` method is invoked whenever the `parse()` method of the parser is called and is passed the source
Chris@13	111 code that is to be lexed (including the opening tag). It can be used to reset state or preprocess the source code or tokens. The
Chris@13	112 passed `ErrorHandler` should be used to report lexing errors.
Chris@0	113
Chris@0	114 The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not
Chris@0	115 used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes.
Chris@0	116
Chris@0	117 The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the
Chris@0	118 remaining string after the construct (not including `();`).
Chris@0	119
Chris@0	120 The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more
Chris@0	121 tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the
Chris@0	122 token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser).
Chris@0	123
Chris@0	124 ### Attribute handling
Chris@0	125
Chris@0	126 The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be
Chris@0	127 assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the
Chris@0	128 node and the `$endAttributes` from the last token that is part of the node.
Chris@0	129
Chris@0	130 E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the
Chris@0	131 `T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token.
Chris@0	132
Chris@0	133 An application of custom attributes is storing the exact original formatting of literals: While the parser does retain
Chris@0	134 some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it
Chris@0	135 does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This
Chris@0	136 can be remedied by storing the original value in an attribute:
Chris@0	137
Chris@0	138 ```php
Chris@0	139 use PhpParser\Lexer;
Chris@0	140 use PhpParser\Parser\Tokens;
Chris@0	141
Chris@0	142 class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative
Chris@0	143 {
Chris@0	144 public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
Chris@0	145 $tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
Chris@0	146
Chris@0	147 if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string
Chris@0	148 \|\| $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string
Chris@0	149 \|\| $tokenId == Tokens::T_LNUMBER // integer
Chris@0	150 \|\| $tokenId == Tokens::T_DNUMBER // floating point number
Chris@0	151 ) {
Chris@0	152 // could also use $startAttributes, doesn't really matter here
Chris@0	153 $endAttributes['originalValue'] = $value;
Chris@0	154 }
Chris@0	155
Chris@0	156 return $tokenId;
Chris@0	157 }
Chris@0	158 }
Chris@0	159 ```

Mercurial > hg > isophonics-drupal-site

annotate vendor/nikic/php-parser/doc/component/Lexer.markdown @ 19:fa3358dc1485 tip