wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: The DOT Language wolffd@0: wolffd@0: wolffd@0: wolffd@0:

The DOT Language

wolffd@0:
wolffd@0: The following is an abstract grammar defining the DOT language. wolffd@0: Terminals are shown in bold font and nonterminals in italics. wolffd@0: Literal characters are given in single quotes. wolffd@0: Parentheses ( and ) indicate grouping when needed. wolffd@0: Square brackets [ and ] enclose optional items. wolffd@0: Vertical bars | separate alternatives. wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0: wolffd@0:
graph:[ strict ] (graph | digraph) [ ID ] '{' stmt_list '}'
stmt_list:[ stmt [ ';' ] [ stmt_list ] ]
stmt:node_stmt
|edge_stmt
|attr_stmt
|ID '=' ID
|subgraph
attr_stmt:(graph | node | edge) attr_list
attr_list:'[' [ a_list ] ']' [ attr_list ]
a_list:ID [ '=' ID ] [ ',' ] [ a_list ]
edge_stmt:(node_id | subgraph) edgeRHS [ attr_list ]
edgeRHS:edgeop (node_id | subgraph) [ edgeRHS ]
node_stmt:node_id [ attr_list ]
node_id:ID [ port ]
port:':' ID [ ':' compass_pt ]
|':' compass_pt
subgraph:[ subgraph [ ID ] ] '{' stmt_list '}'
compass_pt:(n | ne | e | se | s | sw | w | nw | c | _)
wolffd@0:

wolffd@0: The keywords node, edge, graph, digraph, wolffd@0: subgraph, and strict are case-independent. wolffd@0: Note also that the allowed compass point values are not keywords, so wolffd@0: these strings can be used elsewhere as ordinary identifiers and, conversely, wolffd@0: the parser will actually accept any identifier. wolffd@0:

wolffd@0: An ID is one of the following: wolffd@0:

wolffd@0:
  • Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or wolffd@0: digits ([0-9]), not beginning with a digit; wolffd@0:
  • a number [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? ); wolffd@0:
  • any double-quoted string ("...") possibly containing escaped wolffd@0: quotes (\")1; wolffd@0:
  • an HTML string (<...>). wolffd@0:
  • wolffd@0: Note that in HTML strings, angle wolffd@0: brackets must occur in matched pairs, and unescaped newlines are allowed. wolffd@0: In addition, the content must be legal XML, so that the special XML wolffd@0: escape sequences for ", &, <, and > may be necessary wolffd@0: in order to embed these characters in attribute values or raw text. wolffd@0:

    wolffd@0: Both quoted strings and HTML strings are scanned as a unit, so wolffd@0: any embedded comments will be treated as part of the strings. wolffd@0:

    wolffd@0: An edgeop is -> in directed graphs and -- in wolffd@0: undirected graphs. wolffd@0:

    wolffd@0: An a_list clause of the form ID is equivalent to wolffd@0: ID=true. wolffd@0:

    wolffd@0: The language supports C++-style comments: /* */ and //. wolffd@0: In addition, a line beginning with a '#' character is considered a line wolffd@0: output from a C preprocessor (e.g., # 34 to indicate line 34 ) and discarded. wolffd@0:

    wolffd@0: Semicolons aid readability but are not required except in the rare case wolffd@0: that a named subgraph with no body immediately preceeds an anonymous wolffd@0: subgraph, since the precedence rules cause this sequence to be parsed as wolffd@0: a subgraph with a heading and a body. wolffd@0: Also, any amount of whitespace may be inserted between terminals. wolffd@0:

    wolffd@0: As another aid for readability, dot allows single logical lines to wolffd@0: span multiple physical lines using the standard C convention of a wolffd@0: backslash immediately preceding a newline character. In addition, wolffd@0: double-quoted strings can be concatenated using a '+' operator. wolffd@0: As HTML strings can contain newline characters, they do not support the wolffd@0: concatenation operator. wolffd@0:

    Semantic Notes

    wolffd@0: If a default attribute is wolffd@0: defined using a node, edge, or graph statement, wolffd@0: or by an attribute assignment not attached to a node or edge, any object of the wolffd@0: appropriate type defined afterwards will inherit this attribute value. wolffd@0: This holds until the default attribute is set to a new value, from which wolffd@0: point the new value is used. Objects defined before a default attribute wolffd@0: is set will have an empty string value attached to the attribute once wolffd@0: the default attribute definition is made. wolffd@0:

    wolffd@0: Note, in particular, that a subgraph receives the attribute settings of wolffd@0: its parent graph at the time of its definition. This can be useful; for wolffd@0: example, one can assign a font to the root graph and all subgraphs will wolffd@0: also use the font. For some attributes, however, this property is wolffd@0: undesirable. If one attaches a label to the root graph, it is probably wolffd@0: not the desired effect to have the label used by all subgraphs. Rather wolffd@0: than listing the graph attribute at the top of the graph, and the wolffd@0: resetting the attribute as needed in the subgraphs, one can simple defer wolffd@0: the attribute definition if the graph until the appropriate subgraphs wolffd@0: have been defined. wolffd@0:

    wolffd@0: If an edge belongs to a cluster, its endpoints belong to that cluster. wolffd@0: Thus, where you put an edge can effect a layout, as clusters are sometimes wolffd@0: laid out recursively. wolffd@0:

    Character encodings

    wolffd@0: The DOT language assumes at least the ascii character set. wolffd@0: Quoted strings, both ordinary and HTML-like, may contain non-ascii characters. wolffd@0: In most cases, these strings are uninterpreted: they simply serve as wolffd@0: unique identifiers or values passed through untouched. Labels, however, wolffd@0: are meant to be displayed, which requires that the software be able to wolffd@0: compute the size of the text and determine the appropriate glyphs. wolffd@0: For this, it needs to know what character encoding is used. wolffd@0:

    wolffd@0: By default, DOT assumes the UTF-8 character encoding. It also accepts wolffd@0: the Latin1 (ISO-8859-1) character set, assuming the input graph uses wolffd@0: the charset attribute to wolffd@0: specify this. For graphs using other wolffd@0: character sets, there are usually programs, such as iconv, which wolffd@0: will translate from one character set to another. wolffd@0:

    wolffd@0: Another way to avoid non-ascii characters in labels is to use HTML entities wolffd@0: for special characters. During label evaluation, these entities are wolffd@0: translated into the underlying character. This wolffd@0: wolffd@0: table shows the supported entities, with their Unicode value, a typical wolffd@0: glyph, and the HTML entity name. Thus, to include a lower-case Greek beta wolffd@0: into a string, one can use the ascii sequence &beta;. wolffd@0: In general, one should only use entities that are allowed in the output wolffd@0: character set, and for which there is a glyph in the font. wolffd@0:


    wolffd@0:
      wolffd@0:
    1. In quoted strings in DOT, the only escaped character is double-quote wolffd@0: ("). That is, in quoted strings, the dyad \" is converted to "; all other wolffd@0: characters are left unchanged. In particular, \\ remains \\. Layout wolffd@0: engines may apply additional escape sequences. wolffd@0:
    wolffd@0: wolffd@0: