Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: The DOT Language Daniel@0: Daniel@0: Daniel@0: Daniel@0:

The DOT Language

Daniel@0:

Daniel@0: The following is an abstract grammar defining the DOT language. Daniel@0: Terminals are shown in bold font and nonterminals in italics. Daniel@0: Literal characters are given in single quotes. Daniel@0: Parentheses ( and ) indicate grouping when needed. Daniel@0: Square brackets [ and ] enclose optional items. Daniel@0: Vertical bars | separate alternatives. Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0:

graph	:	[ strict ] (graph \| digraph) [ ID ] '{' stmt_list '}'
stmt_list	:	[ stmt [ ';' ] [ stmt_list ] ]
stmt	:	node_stmt
	\|	edge_stmt
	\|	attr_stmt
	\|	ID '=' ID
	\|	subgraph
attr_stmt	:	(graph \| node \| edge) attr_list
attr_list	:	'[' [ a_list ] ']' [ attr_list ]
a_list	:	ID [ '=' ID ] [ ',' ] [ a_list ]
edge_stmt	:	(node_id \| subgraph) edgeRHS [ attr_list ]
edgeRHS	:	edgeop (node_id \| subgraph) [ edgeRHS ]
node_stmt	:	node_id [ attr_list ]
node_id	:	ID [ port ]
port	:	':' ID [ ':' compass_pt ]
	\|	':' compass_pt
subgraph	:	[ subgraph [ ID ] ] '{' stmt_list '}'
compass_pt	:	(n \| ne \| e \| se \| s \| sw \| w \| nw \| c \| _)

Daniel@0:

Daniel@0: The keywords node, edge, graph, digraph, Daniel@0: subgraph, and strict are case-independent. Daniel@0: Note also that the allowed compass point values are not keywords, so Daniel@0: these strings can be used elsewhere as ordinary identifiers and, conversely, Daniel@0: the parser will actually accept any identifier. Daniel@0:

Daniel@0: An ID is one of the following: Daniel@0:

Daniel@0:

Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or Daniel@0: digits ([0-9]), not beginning with a digit; Daniel@0:

a number [-]^?(.[0-9]⁺ | [0-9]⁺(.[0-9]^*)^? ); Daniel@0:

any double-quoted string ("...") possibly containing escaped Daniel@0: quotes (\")¹; Daniel@0:

an HTML string (<...>). Daniel@0:

Daniel@0: Note that in HTML strings, angle Daniel@0: brackets must occur in matched pairs, and unescaped newlines are allowed. Daniel@0: In addition, the content must be legal XML, so that the special XML Daniel@0: escape sequences for ", &, <, and > may be necessary Daniel@0: in order to embed these characters in attribute values or raw text. Daniel@0:

Daniel@0: Both quoted strings and HTML strings are scanned as a unit, so Daniel@0: any embedded comments will be treated as part of the strings. Daniel@0:

Daniel@0: An edgeop is -> in directed graphs and -- in Daniel@0: undirected graphs. Daniel@0:

Daniel@0: An a_list clause of the form ID is equivalent to Daniel@0: ID=true. Daniel@0:

Daniel@0: The language supports C++-style comments: /* */ and //. Daniel@0: In addition, a line beginning with a '#' character is considered a line Daniel@0: output from a C preprocessor (e.g., # 34 to indicate line 34 ) and discarded. Daniel@0:

Daniel@0: Semicolons aid readability but are not required except in the rare case Daniel@0: that a named subgraph with no body immediately preceeds an anonymous Daniel@0: subgraph, since the precedence rules cause this sequence to be parsed as Daniel@0: a subgraph with a heading and a body. Daniel@0: Also, any amount of whitespace may be inserted between terminals. Daniel@0:

Daniel@0: As another aid for readability, dot allows single logical lines to Daniel@0: span multiple physical lines using the standard C convention of a Daniel@0: backslash immediately preceding a newline character. In addition, Daniel@0: double-quoted strings can be concatenated using a '+' operator. Daniel@0: As HTML strings can contain newline characters, they do not support the Daniel@0: concatenation operator. Daniel@0:

Semantic Notes

Daniel@0: If a default attribute is Daniel@0: defined using a node, edge, or graph statement, Daniel@0: or by an attribute assignment not attached to a node or edge, any object of the Daniel@0: appropriate type defined afterwards will inherit this attribute value. Daniel@0: This holds until the default attribute is set to a new value, from which Daniel@0: point the new value is used. Objects defined before a default attribute Daniel@0: is set will have an empty string value attached to the attribute once Daniel@0: the default attribute definition is made. Daniel@0:

Daniel@0: Note, in particular, that a subgraph receives the attribute settings of Daniel@0: its parent graph at the time of its definition. This can be useful; for Daniel@0: example, one can assign a font to the root graph and all subgraphs will Daniel@0: also use the font. For some attributes, however, this property is Daniel@0: undesirable. If one attaches a label to the root graph, it is probably Daniel@0: not the desired effect to have the label used by all subgraphs. Rather Daniel@0: than listing the graph attribute at the top of the graph, and the Daniel@0: resetting the attribute as needed in the subgraphs, one can simple defer Daniel@0: the attribute definition if the graph until the appropriate subgraphs Daniel@0: have been defined. Daniel@0:

Daniel@0: If an edge belongs to a cluster, its endpoints belong to that cluster. Daniel@0: Thus, where you put an edge can effect a layout, as clusters are sometimes Daniel@0: laid out recursively. Daniel@0:

Character encodings

Daniel@0: The DOT language assumes at least the ascii character set. Daniel@0: Quoted strings, both ordinary and HTML-like, may contain non-ascii characters. Daniel@0: In most cases, these strings are uninterpreted: they simply serve as Daniel@0: unique identifiers or values passed through untouched. Labels, however, Daniel@0: are meant to be displayed, which requires that the software be able to Daniel@0: compute the size of the text and determine the appropriate glyphs. Daniel@0: For this, it needs to know what character encoding is used. Daniel@0:

Daniel@0: By default, DOT assumes the UTF-8 character encoding. It also accepts Daniel@0: the Latin1 (ISO-8859-1) character set, assuming the input graph uses Daniel@0: the charset attribute to Daniel@0: specify this. For graphs using other Daniel@0: character sets, there are usually programs, such as iconv, which Daniel@0: will translate from one character set to another. Daniel@0:

Daniel@0: Another way to avoid non-ascii characters in labels is to use HTML entities Daniel@0: for special characters. During label evaluation, these entities are Daniel@0: translated into the underlying character. This Daniel@0: Daniel@0: table shows the supported entities, with their Unicode value, a typical Daniel@0: glyph, and the HTML entity name. Thus, to include a lower-case Greek beta Daniel@0: into a string, one can use the ascii sequence β. Daniel@0: In general, one should only use entities that are allowed in the output Daniel@0: character set, and for which there is a glyph in the font. Daniel@0:

Daniel@0:

In quoted strings in DOT, the only escaped character is double-quote Daniel@0: ("). That is, in quoted strings, the dyad \" is converted to "; all other Daniel@0: characters are left unchanged. In particular, \\ remains \\. Layout Daniel@0: engines may apply additional escape sequences. Daniel@0:

Daniel@0: Daniel@0: