cannam@62: --- cannam@62: layout: page cannam@62: title: Schema Language cannam@62: --- cannam@62: cannam@62: # Schema Language cannam@62: cannam@62: Like Protocol Buffers and Thrift (but unlike JSON or MessagePack), Cap'n Proto messages are cannam@62: strongly-typed and not self-describing. You must define your message structure in a special cannam@62: language, then invoke the Cap'n Proto compiler (`capnp compile`) to generate source code to cannam@62: manipulate that message type in your desired language. cannam@62: cannam@62: For example: cannam@62: cannam@62: {% highlight capnp %} cannam@62: @0xdbb9ad1f14bf0b36; # unique file ID, generated by `capnp id` cannam@62: cannam@62: struct Person { cannam@62: name @0 :Text; cannam@62: birthdate @3 :Date; cannam@62: cannam@62: email @1 :Text; cannam@62: phones @2 :List(PhoneNumber); cannam@62: cannam@62: struct PhoneNumber { cannam@62: number @0 :Text; cannam@62: type @1 :Type; cannam@62: cannam@62: enum Type { cannam@62: mobile @0; cannam@62: home @1; cannam@62: work @2; cannam@62: } cannam@62: } cannam@62: } cannam@62: cannam@62: struct Date { cannam@62: year @0 :Int16; cannam@62: month @1 :UInt8; cannam@62: day @2 :UInt8; cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Some notes: cannam@62: cannam@62: * Types come after names. The name is by far the most important thing to see, especially when cannam@62: quickly skimming, so we put it up front where it is most visible. Sorry, C got it wrong. cannam@62: * The `@N` annotations show how the protocol evolved over time, so that the system can make sure cannam@62: to maintain compatibility with older versions. Fields (and enumerants, and interface methods) cannam@62: must be numbered consecutively starting from zero in the order in which they were added. In this cannam@62: example, it looks like the `birthdate` field was added to the `Person` structure recently -- its cannam@62: number is higher than the `email` and `phones` fields. Unlike Protobufs, you cannot skip numbers cannam@62: when defining fields -- but there was never any reason to do so anyway. cannam@62: cannam@62: ## Language Reference cannam@62: cannam@62: ### Comments cannam@62: cannam@62: Comments are indicated by hash signs and extend to the end of the line: cannam@62: cannam@62: {% highlight capnp %} cannam@62: # This is a comment. cannam@62: {% endhighlight %} cannam@62: cannam@62: Comments meant as documentation should appear _after_ the declaration, either on the same line, or cannam@62: on a subsequent line. Doc comments for aggregate definitions should appear on the line after the cannam@62: opening brace. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Date { cannam@62: # A standard Gregorian calendar date. cannam@62: cannam@62: year @0 :Int16; cannam@62: # The year. Must include the century. cannam@62: # Negative value indicates BC. cannam@62: cannam@62: month @1 :UInt8; # Month number, 1-12. cannam@62: day @2 :UInt8; # Day number, 1-30. cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Placing the comment _after_ the declaration rather than before makes the code more readable, cannam@62: especially when doc comments grow long. You almost always need to see the declaration before you cannam@62: can start reading the comment. cannam@62: cannam@62: ### Built-in Types cannam@62: cannam@62: The following types are automatically defined: cannam@62: cannam@62: * **Void:** `Void` cannam@62: * **Boolean:** `Bool` cannam@62: * **Integers:** `Int8`, `Int16`, `Int32`, `Int64` cannam@62: * **Unsigned integers:** `UInt8`, `UInt16`, `UInt32`, `UInt64` cannam@62: * **Floating-point:** `Float32`, `Float64` cannam@62: * **Blobs:** `Text`, `Data` cannam@62: * **Lists:** `List(T)` cannam@62: cannam@62: Notes: cannam@62: cannam@62: * The `Void` type has exactly one possible value, and thus can be encoded in zero bits. It is cannam@62: rarely used, but can be useful as a union member. cannam@62: * `Text` is always UTF-8 encoded and NUL-terminated. cannam@62: * `Data` is a completely arbitrary sequence of bytes. cannam@62: * `List` is a parameterized type, where the parameter is the element type. For example, cannam@62: `List(Int32)`, `List(Person)`, and `List(List(Text))` are all valid. cannam@62: cannam@62: ### Structs cannam@62: cannam@62: A struct has a set of named, typed fields, numbered consecutively starting from zero. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Person { cannam@62: name @0 :Text; cannam@62: email @1 :Text; cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Fields can have default values: cannam@62: cannam@62: {% highlight capnp %} cannam@62: foo @0 :Int32 = 123; cannam@62: bar @1 :Text = "blah"; cannam@62: baz @2 :List(Bool) = [ true, false, false, true ]; cannam@62: qux @3 :Person = (name = "Bob", email = "bob@example.com"); cannam@62: corge @4 :Void = void; cannam@62: grault @5 :Data = 0x"a1 40 33"; cannam@62: {% endhighlight %} cannam@62: cannam@62: ### Unions cannam@62: cannam@62: A union is two or more fields of a struct which are stored in the same location. Only one of cannam@62: these fields can be set at a time, and a separate tag is maintained to track which one is cannam@62: currently set. Unlike in C, unions are not types, they are simply properties of fields, therefore cannam@62: union declarations do not look like types. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Person { cannam@62: # ... cannam@62: cannam@62: employment :union { cannam@62: unemployed @4 :Void; cannam@62: employer @5 :Company; cannam@62: school @6 :School; cannam@62: selfEmployed @7 :Void; cannam@62: # We assume that a person is only one of these. cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Additionally, unions can be unnamed. Each struct can contain no more than one unnamed union. Use cannam@62: unnamed unions in cases where you would struggle to think of an appropriate name for the union, cannam@62: because the union represents the main body of the struct. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Shape { cannam@62: area @0 :Float64; cannam@62: cannam@62: union { cannam@62: circle @1 :Float64; # radius cannam@62: square @2 :Float64; # width cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Notes: cannam@62: cannam@62: * Unions members are numbered in the same number space as fields of the containing struct. cannam@62: Remember that the purpose of the numbers is to indicate the evolution order of the cannam@62: struct. The system needs to know when the union fields were declared relative to the non-union cannam@62: fields. cannam@62: cannam@62: * Notice that we used the "useless" `Void` type here. We don't have any extra information to store cannam@62: for the `unemployed` or `selfEmployed` cases, but we still want the union to distinguish these cannam@62: states from others. cannam@62: cannam@62: * By default, when a struct is initialized, the lowest-numbered field in the union is "set". If cannam@62: you do not want any field set by default, simply declare a field called "unset" and make it the cannam@62: lowest-numbered field. cannam@62: cannam@62: * You can move an existing field into a new union without breaking compatibility with existing cannam@62: data, as long as all of the other fields in the union are new. Since the existing field is cannam@62: necessarily the lowest-numbered in the union, it will be the union's default field. cannam@62: cannam@62: **Wait, why aren't unions first-class types?** cannam@62: cannam@62: Requiring unions to be declared inside a struct, rather than living as free-standing types, has cannam@62: some important advantages: cannam@62: cannam@62: * If unions were first-class types, then union members would clearly have to be numbered separately cannam@62: from the containing type's fields. This means that the compiler, when deciding how to position cannam@62: the union in its containing struct, would have to conservatively assume that any kind of new cannam@62: field might be added to the union in the future. To support this, all unions would have to cannam@62: be allocated as separate objects embedded by pointer, wasting space. cannam@62: cannam@62: * A free-standing union would be a liability for protocol evolution, because no additional data cannam@62: can be attached to it later on. Consider, for example, a type which represents a parser token. cannam@62: This type is naturally a union: it may be a keyword, identifier, numeric literal, quoted string, cannam@62: etc. So the author defines it as a union, and the type is used widely. Later on, the developer cannam@62: wants to attach information to the token indicating its line and column number in the source cannam@62: file. Unfortunately, this is impossible without updating all users of the type, because the new cannam@62: information ought to apply to _all_ token instances, not just specific members of the union. On cannam@62: the other hand, if unions must be embedded within structs, it is always possible to add new cannam@62: fields to the struct later on. cannam@62: cannam@62: * When evolving a protocol it is common to discover that some existing field really should have cannam@62: been enclosed in a union, because new fields being added are mutually exclusive with it. With cannam@62: Cap'n Proto's unions, it is actually possible to "retroactively unionize" such a field without cannam@62: changing its layout. This allows you to continue being able to read old data without wasting cannam@62: space when writing new data. This is only possible when unions are declared within their cannam@62: containing struct. cannam@62: cannam@62: Cap'n Proto's unconventional approach to unions provides these advantages without any real down cannam@62: side: where you would conventionally define a free-standing union type, in Cap'n Proto you cannam@62: may simply define a struct type that contains only that union (probably unnamed), and you have cannam@62: achieved the same effect. Thus, aside from being slightly unintuitive, it is strictly superior. cannam@62: cannam@62: ### Groups cannam@62: cannam@62: A group is a set of fields that are encapsulated in their own scope. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Person { cannam@62: # ... cannam@62: cannam@62: # Note: This is a terrible way to use groups, and meant cannam@62: # only to demonstrate the syntax. cannam@62: address :group { cannam@62: houseNumber @8 :UInt32; cannam@62: street @9 :Text; cannam@62: city @10 :Text; cannam@62: country @11 :Text; cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Interface-wise, the above group behaves as if you had defined a nested struct called `Address` and cannam@62: then a field `address :Address`. However, a group is _not_ a separate object from its containing cannam@62: struct: the fields are numbered in the same space as the containing struct's fields, and are laid cannam@62: out exactly the same as if they hadn't been grouped at all. Essentially, a group is just a cannam@62: namespace. cannam@62: cannam@62: Groups on their own (as in the above example) are useless, almost as much so as the `Void` type. cannam@62: They become interesting when used together with unions. cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Shape { cannam@62: area @0 :Float64; cannam@62: cannam@62: union { cannam@62: circle :group { cannam@62: radius @1 :Float64; cannam@62: } cannam@62: rectangle :group { cannam@62: width @2 :Float64; cannam@62: height @3 :Float64; cannam@62: } cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: There are two main reason to use groups with unions: cannam@62: cannam@62: 1. They are often more self-documenting. Notice that `radius` is now a member of `circle`, so cannam@62: we don't need a comment to explain that the value of `circle` is its radius. cannam@62: 2. You can add additional members later on, without breaking compatibility. Notice how we upgraded cannam@62: `square` to `rectangle` above, adding a `height` field. This definition is actually cannam@62: wire-compatible with the previous version of the `Shape` example from the "union" section cannam@62: (aside from the fact that `height` will always be zero when reading old data -- hey, it's not cannam@62: a perfect example). In real-world use, it is common to realize after the fact that you need to cannam@62: add some information to a struct that only applies when one particular union field is set. cannam@62: Without the ability to upgrade to a group, you would have to define the new field separately, cannam@62: and have it waste space when not relevant. cannam@62: cannam@62: Note that a named union is actually exactly equivalent to a named group containing an unnamed cannam@62: union. cannam@62: cannam@62: **Wait, weren't groups considered a misfeature in Protobufs? Why did you do this again?** cannam@62: cannam@62: They are useful in unions, which Protobufs did not have. Meanwhile, you cannot have a "repeated cannam@62: group" in Cap'n Proto, which was the case that got into the most trouble with Protobufs. cannam@62: cannam@62: ### Dynamically-typed Fields cannam@62: cannam@62: A struct may have a field with type `AnyPointer`. This field's value can be of any pointer type -- cannam@62: i.e. any struct, interface, list, or blob. This is essentially like a `void*` in C. cannam@62: cannam@62: See also [generics](#generic-types). cannam@62: cannam@62: ### Enums cannam@62: cannam@62: An enum is a type with a small finite set of symbolic values. cannam@62: cannam@62: {% highlight capnp %} cannam@62: enum Rfc3092Variable { cannam@62: foo @0; cannam@62: bar @1; cannam@62: baz @2; cannam@62: qux @3; cannam@62: # ... cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Like fields, enumerants must be numbered sequentially starting from zero. In languages where cannam@62: enums have numeric values, these numbers will be used, but in general Cap'n Proto enums should not cannam@62: be considered numeric. cannam@62: cannam@62: ### Interfaces cannam@62: cannam@62: An interface has a collection of methods, each of which takes some parameters and return some cannam@62: results. Like struct fields, methods are numbered. Interfaces support inheritance, including cannam@62: multiple inheritance. cannam@62: cannam@62: {% highlight capnp %} cannam@62: interface Node { cannam@62: isDirectory @0 () -> (result :Bool); cannam@62: } cannam@62: cannam@62: interface Directory extends(Node) { cannam@62: list @0 () -> (list :List(Entry)); cannam@62: struct Entry { cannam@62: name @0 :Text; cannam@62: node @1 :Node; cannam@62: } cannam@62: cannam@62: create @1 (name :Text) -> (file :File); cannam@62: mkdir @2 (name :Text) -> (directory :Directory); cannam@62: open @3 (name :Text) -> (node :Node); cannam@62: delete @4 (name :Text); cannam@62: link @5 (name :Text, node :Node); cannam@62: } cannam@62: cannam@62: interface File extends(Node) { cannam@62: size @0 () -> (size :UInt64); cannam@62: read @1 (startAt :UInt64 = 0, amount :UInt64 = 0xffffffffffffffff) cannam@62: -> (data :Data); cannam@62: # Default params = read entire file. cannam@62: cannam@62: write @2 (startAt :UInt64, data :Data); cannam@62: truncate @3 (size :UInt64); cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Notice something interesting here: `Node`, `Directory`, and `File` are interfaces, but several cannam@62: methods take these types as parameters or return them as results. `Directory.Entry` is a struct, cannam@62: but it contains a `Node`, which is an interface. Structs (and primitive types) are passed over RPC cannam@62: by value, but interfaces are passed by reference. So when `Directory.list` is called remotely, the cannam@62: content of a `List(Entry)` (including the text of each `name`) is transmitted back, but for the cannam@62: `node` field, only a reference to some remote `Node` object is sent. cannam@62: cannam@62: When an address of an object is transmitted, the RPC system automatically manages making sure that cannam@62: the recipient gets permission to call the addressed object -- because if the recipient wasn't cannam@62: meant to have access, the sender shouldn't have sent the reference in the first place. This makes cannam@62: it very easy to develop secure protocols with Cap'n Proto -- you almost don't need to think about cannam@62: access control at all. This feature is what makes Cap'n Proto a "capability-based" RPC system -- a cannam@62: reference to an object inherently represents a "capability" to access it. cannam@62: cannam@62: ### Generic Types cannam@62: cannam@62: A struct or interface type may be parameterized, making it "generic". For example, this is useful cannam@62: for defining type-safe containers: cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Map(Key, Value) { cannam@62: entries @0 :List(Entry); cannam@62: struct Entry { cannam@62: key @0 :Key; cannam@62: value @1 :Value; cannam@62: } cannam@62: } cannam@62: cannam@62: struct People { cannam@62: byName @0 :Map(Text, Person); cannam@62: # Maps names to Person instances. cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Cap'n Proto generics work very similarly to Java generics or C++ templates. Some notes: cannam@62: cannam@62: * Only pointer types (structs, lists, blobs, and interfaces) can be used as generic parameters, cannam@62: much like in Java. This is a pragmatic limitation: allowing parameters to have non-pointer types cannam@62: would mean that different parameterizations of a struct could have completely different layouts, cannam@62: which would excessively complicate the Cap'n Proto implementation. cannam@62: cannam@62: * A type declaration nested inside a generic type may use the type parameters of the outer type, cannam@62: as you can see in the example above. This differs from Java, but matches C++. If you want to cannam@62: refer to a nested type from outside the outer type, you must specify the parameters on the outer cannam@62: type, not the inner. For example, `Map(Text, Person).Entry` is a valid type; cannam@62: `Map.Entry(Text, Person)` is NOT valid. (Of course, an inner type may declare additional generic cannam@62: parameters.) cannam@62: cannam@62: * If you refer to a generic type but omit its parameters (e.g. declare a field of type `Map` rather cannam@62: than `Map(T, U)`), it is as if you specified `AnyPointer` for each parameter. Note that such cannam@62: a type is wire-compatible with any specific parameterization, so long as you interpret the cannam@62: `AnyPointer`s as the correct type at runtime. cannam@62: cannam@62: * Relatedly, it is safe to cast an generic interface of a specific parameterization to a generic cannam@62: interface where all parameters are `AnyPointer` and vice versa, as long as the `AnyPointer`s are cannam@62: treated as the correct type at runtime. This means that e.g. you can implement a server in a cannam@62: generic way that is correct for all parameterizations but call it from clients using a specific cannam@62: parameterization. cannam@62: cannam@62: * The encoding of a generic type is exactly the same as the encoding of a type produced by cannam@62: substituting the type parameters manually. For example, `Map(Text, Person)` is encoded exactly cannam@62: the same as: cannam@62: cannam@62: