Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: The DOT Language Daniel@0: Daniel@0: Daniel@0: Daniel@0:

The DOT Language

Daniel@0:
Daniel@0: The following is an abstract grammar defining the DOT language. Daniel@0: Terminals are shown in bold font and nonterminals in italics. Daniel@0: Literal characters are given in single quotes. Daniel@0: Parentheses ( and ) indicate grouping when needed. Daniel@0: Square brackets [ and ] enclose optional items. Daniel@0: Vertical bars | separate alternatives. Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0: Daniel@0:
graph:[ strict ] (graph | digraph) [ ID ] '{' stmt_list '}'
stmt_list:[ stmt [ ';' ] [ stmt_list ] ]
stmt:node_stmt
|edge_stmt
|attr_stmt
|ID '=' ID
|subgraph
attr_stmt:(graph | node | edge) attr_list
attr_list:'[' [ a_list ] ']' [ attr_list ]
a_list:ID [ '=' ID ] [ ',' ] [ a_list ]
edge_stmt:(node_id | subgraph) edgeRHS [ attr_list ]
edgeRHS:edgeop (node_id | subgraph) [ edgeRHS ]
node_stmt:node_id [ attr_list ]
node_id:ID [ port ]
port:':' ID [ ':' compass_pt ]
|':' compass_pt
subgraph:[ subgraph [ ID ] ] '{' stmt_list '}'
compass_pt:(n | ne | e | se | s | sw | w | nw | c | _)
Daniel@0:

Daniel@0: The keywords node, edge, graph, digraph, Daniel@0: subgraph, and strict are case-independent. Daniel@0: Note also that the allowed compass point values are not keywords, so Daniel@0: these strings can be used elsewhere as ordinary identifiers and, conversely, Daniel@0: the parser will actually accept any identifier. Daniel@0:

Daniel@0: An ID is one of the following: Daniel@0:

Daniel@0:
  • Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or Daniel@0: digits ([0-9]), not beginning with a digit; Daniel@0:
  • a number [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? ); Daniel@0:
  • any double-quoted string ("...") possibly containing escaped Daniel@0: quotes (\")1; Daniel@0:
  • an HTML string (<...>). Daniel@0:
  • Daniel@0: Note that in HTML strings, angle Daniel@0: brackets must occur in matched pairs, and unescaped newlines are allowed. Daniel@0: In addition, the content must be legal XML, so that the special XML Daniel@0: escape sequences for ", &, <, and > may be necessary Daniel@0: in order to embed these characters in attribute values or raw text. Daniel@0:

    Daniel@0: Both quoted strings and HTML strings are scanned as a unit, so Daniel@0: any embedded comments will be treated as part of the strings. Daniel@0:

    Daniel@0: An edgeop is -> in directed graphs and -- in Daniel@0: undirected graphs. Daniel@0:

    Daniel@0: An a_list clause of the form ID is equivalent to Daniel@0: ID=true. Daniel@0:

    Daniel@0: The language supports C++-style comments: /* */ and //. Daniel@0: In addition, a line beginning with a '#' character is considered a line Daniel@0: output from a C preprocessor (e.g., # 34 to indicate line 34 ) and discarded. Daniel@0:

    Daniel@0: Semicolons aid readability but are not required except in the rare case Daniel@0: that a named subgraph with no body immediately preceeds an anonymous Daniel@0: subgraph, since the precedence rules cause this sequence to be parsed as Daniel@0: a subgraph with a heading and a body. Daniel@0: Also, any amount of whitespace may be inserted between terminals. Daniel@0:

    Daniel@0: As another aid for readability, dot allows single logical lines to Daniel@0: span multiple physical lines using the standard C convention of a Daniel@0: backslash immediately preceding a newline character. In addition, Daniel@0: double-quoted strings can be concatenated using a '+' operator. Daniel@0: As HTML strings can contain newline characters, they do not support the Daniel@0: concatenation operator. Daniel@0:

    Semantic Notes

    Daniel@0: If a default attribute is Daniel@0: defined using a node, edge, or graph statement, Daniel@0: or by an attribute assignment not attached to a node or edge, any object of the Daniel@0: appropriate type defined afterwards will inherit this attribute value. Daniel@0: This holds until the default attribute is set to a new value, from which Daniel@0: point the new value is used. Objects defined before a default attribute Daniel@0: is set will have an empty string value attached to the attribute once Daniel@0: the default attribute definition is made. Daniel@0:

    Daniel@0: Note, in particular, that a subgraph receives the attribute settings of Daniel@0: its parent graph at the time of its definition. This can be useful; for Daniel@0: example, one can assign a font to the root graph and all subgraphs will Daniel@0: also use the font. For some attributes, however, this property is Daniel@0: undesirable. If one attaches a label to the root graph, it is probably Daniel@0: not the desired effect to have the label used by all subgraphs. Rather Daniel@0: than listing the graph attribute at the top of the graph, and the Daniel@0: resetting the attribute as needed in the subgraphs, one can simple defer Daniel@0: the attribute definition if the graph until the appropriate subgraphs Daniel@0: have been defined. Daniel@0:

    Daniel@0: If an edge belongs to a cluster, its endpoints belong to that cluster. Daniel@0: Thus, where you put an edge can effect a layout, as clusters are sometimes Daniel@0: laid out recursively. Daniel@0:

    Character encodings

    Daniel@0: The DOT language assumes at least the ascii character set. Daniel@0: Quoted strings, both ordinary and HTML-like, may contain non-ascii characters. Daniel@0: In most cases, these strings are uninterpreted: they simply serve as Daniel@0: unique identifiers or values passed through untouched. Labels, however, Daniel@0: are meant to be displayed, which requires that the software be able to Daniel@0: compute the size of the text and determine the appropriate glyphs. Daniel@0: For this, it needs to know what character encoding is used. Daniel@0:

    Daniel@0: By default, DOT assumes the UTF-8 character encoding. It also accepts Daniel@0: the Latin1 (ISO-8859-1) character set, assuming the input graph uses Daniel@0: the charset attribute to Daniel@0: specify this. For graphs using other Daniel@0: character sets, there are usually programs, such as iconv, which Daniel@0: will translate from one character set to another. Daniel@0:

    Daniel@0: Another way to avoid non-ascii characters in labels is to use HTML entities Daniel@0: for special characters. During label evaluation, these entities are Daniel@0: translated into the underlying character. This Daniel@0: Daniel@0: table shows the supported entities, with their Unicode value, a typical Daniel@0: glyph, and the HTML entity name. Thus, to include a lower-case Greek beta Daniel@0: into a string, one can use the ascii sequence &beta;. Daniel@0: In general, one should only use entities that are allowed in the output Daniel@0: character set, and for which there is a glyph in the font. Daniel@0:


    Daniel@0:
      Daniel@0:
    1. In quoted strings in DOT, the only escaped character is double-quote Daniel@0: ("). That is, in quoted strings, the dyad \" is converted to "; all other Daniel@0: characters are left unchanged. In particular, \\ remains \\. Layout Daniel@0: engines may apply additional escape sequences. Daniel@0:
    Daniel@0: Daniel@0: