annotate src/capnproto-0.6.0/doc/language.md @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 0994c39f1e94
children
rev   line source
cannam@62 1 ---
cannam@62 2 layout: page
cannam@62 3 title: Schema Language
cannam@62 4 ---
cannam@62 5
cannam@62 6 # Schema Language
cannam@62 7
cannam@62 8 Like Protocol Buffers and Thrift (but unlike JSON or MessagePack), Cap'n Proto messages are
cannam@62 9 strongly-typed and not self-describing. You must define your message structure in a special
cannam@62 10 language, then invoke the Cap'n Proto compiler (`capnp compile`) to generate source code to
cannam@62 11 manipulate that message type in your desired language.
cannam@62 12
cannam@62 13 For example:
cannam@62 14
cannam@62 15 {% highlight capnp %}
cannam@62 16 @0xdbb9ad1f14bf0b36; # unique file ID, generated by `capnp id`
cannam@62 17
cannam@62 18 struct Person {
cannam@62 19 name @0 :Text;
cannam@62 20 birthdate @3 :Date;
cannam@62 21
cannam@62 22 email @1 :Text;
cannam@62 23 phones @2 :List(PhoneNumber);
cannam@62 24
cannam@62 25 struct PhoneNumber {
cannam@62 26 number @0 :Text;
cannam@62 27 type @1 :Type;
cannam@62 28
cannam@62 29 enum Type {
cannam@62 30 mobile @0;
cannam@62 31 home @1;
cannam@62 32 work @2;
cannam@62 33 }
cannam@62 34 }
cannam@62 35 }
cannam@62 36
cannam@62 37 struct Date {
cannam@62 38 year @0 :Int16;
cannam@62 39 month @1 :UInt8;
cannam@62 40 day @2 :UInt8;
cannam@62 41 }
cannam@62 42 {% endhighlight %}
cannam@62 43
cannam@62 44 Some notes:
cannam@62 45
cannam@62 46 * Types come after names. The name is by far the most important thing to see, especially when
cannam@62 47 quickly skimming, so we put it up front where it is most visible. Sorry, C got it wrong.
cannam@62 48 * The `@N` annotations show how the protocol evolved over time, so that the system can make sure
cannam@62 49 to maintain compatibility with older versions. Fields (and enumerants, and interface methods)
cannam@62 50 must be numbered consecutively starting from zero in the order in which they were added. In this
cannam@62 51 example, it looks like the `birthdate` field was added to the `Person` structure recently -- its
cannam@62 52 number is higher than the `email` and `phones` fields. Unlike Protobufs, you cannot skip numbers
cannam@62 53 when defining fields -- but there was never any reason to do so anyway.
cannam@62 54
cannam@62 55 ## Language Reference
cannam@62 56
cannam@62 57 ### Comments
cannam@62 58
cannam@62 59 Comments are indicated by hash signs and extend to the end of the line:
cannam@62 60
cannam@62 61 {% highlight capnp %}
cannam@62 62 # This is a comment.
cannam@62 63 {% endhighlight %}
cannam@62 64
cannam@62 65 Comments meant as documentation should appear _after_ the declaration, either on the same line, or
cannam@62 66 on a subsequent line. Doc comments for aggregate definitions should appear on the line after the
cannam@62 67 opening brace.
cannam@62 68
cannam@62 69 {% highlight capnp %}
cannam@62 70 struct Date {
cannam@62 71 # A standard Gregorian calendar date.
cannam@62 72
cannam@62 73 year @0 :Int16;
cannam@62 74 # The year. Must include the century.
cannam@62 75 # Negative value indicates BC.
cannam@62 76
cannam@62 77 month @1 :UInt8; # Month number, 1-12.
cannam@62 78 day @2 :UInt8; # Day number, 1-30.
cannam@62 79 }
cannam@62 80 {% endhighlight %}
cannam@62 81
cannam@62 82 Placing the comment _after_ the declaration rather than before makes the code more readable,
cannam@62 83 especially when doc comments grow long. You almost always need to see the declaration before you
cannam@62 84 can start reading the comment.
cannam@62 85
cannam@62 86 ### Built-in Types
cannam@62 87
cannam@62 88 The following types are automatically defined:
cannam@62 89
cannam@62 90 * **Void:** `Void`
cannam@62 91 * **Boolean:** `Bool`
cannam@62 92 * **Integers:** `Int8`, `Int16`, `Int32`, `Int64`
cannam@62 93 * **Unsigned integers:** `UInt8`, `UInt16`, `UInt32`, `UInt64`
cannam@62 94 * **Floating-point:** `Float32`, `Float64`
cannam@62 95 * **Blobs:** `Text`, `Data`
cannam@62 96 * **Lists:** `List(T)`
cannam@62 97
cannam@62 98 Notes:
cannam@62 99
cannam@62 100 * The `Void` type has exactly one possible value, and thus can be encoded in zero bits. It is
cannam@62 101 rarely used, but can be useful as a union member.
cannam@62 102 * `Text` is always UTF-8 encoded and NUL-terminated.
cannam@62 103 * `Data` is a completely arbitrary sequence of bytes.
cannam@62 104 * `List` is a parameterized type, where the parameter is the element type. For example,
cannam@62 105 `List(Int32)`, `List(Person)`, and `List(List(Text))` are all valid.
cannam@62 106
cannam@62 107 ### Structs
cannam@62 108
cannam@62 109 A struct has a set of named, typed fields, numbered consecutively starting from zero.
cannam@62 110
cannam@62 111 {% highlight capnp %}
cannam@62 112 struct Person {
cannam@62 113 name @0 :Text;
cannam@62 114 email @1 :Text;
cannam@62 115 }
cannam@62 116 {% endhighlight %}
cannam@62 117
cannam@62 118 Fields can have default values:
cannam@62 119
cannam@62 120 {% highlight capnp %}
cannam@62 121 foo @0 :Int32 = 123;
cannam@62 122 bar @1 :Text = "blah";
cannam@62 123 baz @2 :List(Bool) = [ true, false, false, true ];
cannam@62 124 qux @3 :Person = (name = "Bob", email = "bob@example.com");
cannam@62 125 corge @4 :Void = void;
cannam@62 126 grault @5 :Data = 0x"a1 40 33";
cannam@62 127 {% endhighlight %}
cannam@62 128
cannam@62 129 ### Unions
cannam@62 130
cannam@62 131 A union is two or more fields of a struct which are stored in the same location. Only one of
cannam@62 132 these fields can be set at a time, and a separate tag is maintained to track which one is
cannam@62 133 currently set. Unlike in C, unions are not types, they are simply properties of fields, therefore
cannam@62 134 union declarations do not look like types.
cannam@62 135
cannam@62 136 {% highlight capnp %}
cannam@62 137 struct Person {
cannam@62 138 # ...
cannam@62 139
cannam@62 140 employment :union {
cannam@62 141 unemployed @4 :Void;
cannam@62 142 employer @5 :Company;
cannam@62 143 school @6 :School;
cannam@62 144 selfEmployed @7 :Void;
cannam@62 145 # We assume that a person is only one of these.
cannam@62 146 }
cannam@62 147 }
cannam@62 148 {% endhighlight %}
cannam@62 149
cannam@62 150 Additionally, unions can be unnamed. Each struct can contain no more than one unnamed union. Use
cannam@62 151 unnamed unions in cases where you would struggle to think of an appropriate name for the union,
cannam@62 152 because the union represents the main body of the struct.
cannam@62 153
cannam@62 154 {% highlight capnp %}
cannam@62 155 struct Shape {
cannam@62 156 area @0 :Float64;
cannam@62 157
cannam@62 158 union {
cannam@62 159 circle @1 :Float64; # radius
cannam@62 160 square @2 :Float64; # width
cannam@62 161 }
cannam@62 162 }
cannam@62 163 {% endhighlight %}
cannam@62 164
cannam@62 165 Notes:
cannam@62 166
cannam@62 167 * Unions members are numbered in the same number space as fields of the containing struct.
cannam@62 168 Remember that the purpose of the numbers is to indicate the evolution order of the
cannam@62 169 struct. The system needs to know when the union fields were declared relative to the non-union
cannam@62 170 fields.
cannam@62 171
cannam@62 172 * Notice that we used the "useless" `Void` type here. We don't have any extra information to store
cannam@62 173 for the `unemployed` or `selfEmployed` cases, but we still want the union to distinguish these
cannam@62 174 states from others.
cannam@62 175
cannam@62 176 * By default, when a struct is initialized, the lowest-numbered field in the union is "set". If
cannam@62 177 you do not want any field set by default, simply declare a field called "unset" and make it the
cannam@62 178 lowest-numbered field.
cannam@62 179
cannam@62 180 * You can move an existing field into a new union without breaking compatibility with existing
cannam@62 181 data, as long as all of the other fields in the union are new. Since the existing field is
cannam@62 182 necessarily the lowest-numbered in the union, it will be the union's default field.
cannam@62 183
cannam@62 184 **Wait, why aren't unions first-class types?**
cannam@62 185
cannam@62 186 Requiring unions to be declared inside a struct, rather than living as free-standing types, has
cannam@62 187 some important advantages:
cannam@62 188
cannam@62 189 * If unions were first-class types, then union members would clearly have to be numbered separately
cannam@62 190 from the containing type's fields. This means that the compiler, when deciding how to position
cannam@62 191 the union in its containing struct, would have to conservatively assume that any kind of new
cannam@62 192 field might be added to the union in the future. To support this, all unions would have to
cannam@62 193 be allocated as separate objects embedded by pointer, wasting space.
cannam@62 194
cannam@62 195 * A free-standing union would be a liability for protocol evolution, because no additional data
cannam@62 196 can be attached to it later on. Consider, for example, a type which represents a parser token.
cannam@62 197 This type is naturally a union: it may be a keyword, identifier, numeric literal, quoted string,
cannam@62 198 etc. So the author defines it as a union, and the type is used widely. Later on, the developer
cannam@62 199 wants to attach information to the token indicating its line and column number in the source
cannam@62 200 file. Unfortunately, this is impossible without updating all users of the type, because the new
cannam@62 201 information ought to apply to _all_ token instances, not just specific members of the union. On
cannam@62 202 the other hand, if unions must be embedded within structs, it is always possible to add new
cannam@62 203 fields to the struct later on.
cannam@62 204
cannam@62 205 * When evolving a protocol it is common to discover that some existing field really should have
cannam@62 206 been enclosed in a union, because new fields being added are mutually exclusive with it. With
cannam@62 207 Cap'n Proto's unions, it is actually possible to "retroactively unionize" such a field without
cannam@62 208 changing its layout. This allows you to continue being able to read old data without wasting
cannam@62 209 space when writing new data. This is only possible when unions are declared within their
cannam@62 210 containing struct.
cannam@62 211
cannam@62 212 Cap'n Proto's unconventional approach to unions provides these advantages without any real down
cannam@62 213 side: where you would conventionally define a free-standing union type, in Cap'n Proto you
cannam@62 214 may simply define a struct type that contains only that union (probably unnamed), and you have
cannam@62 215 achieved the same effect. Thus, aside from being slightly unintuitive, it is strictly superior.
cannam@62 216
cannam@62 217 ### Groups
cannam@62 218
cannam@62 219 A group is a set of fields that are encapsulated in their own scope.
cannam@62 220
cannam@62 221 {% highlight capnp %}
cannam@62 222 struct Person {
cannam@62 223 # ...
cannam@62 224
cannam@62 225 # Note: This is a terrible way to use groups, and meant
cannam@62 226 # only to demonstrate the syntax.
cannam@62 227 address :group {
cannam@62 228 houseNumber @8 :UInt32;
cannam@62 229 street @9 :Text;
cannam@62 230 city @10 :Text;
cannam@62 231 country @11 :Text;
cannam@62 232 }
cannam@62 233 }
cannam@62 234 {% endhighlight %}
cannam@62 235
cannam@62 236 Interface-wise, the above group behaves as if you had defined a nested struct called `Address` and
cannam@62 237 then a field `address :Address`. However, a group is _not_ a separate object from its containing
cannam@62 238 struct: the fields are numbered in the same space as the containing struct's fields, and are laid
cannam@62 239 out exactly the same as if they hadn't been grouped at all. Essentially, a group is just a
cannam@62 240 namespace.
cannam@62 241
cannam@62 242 Groups on their own (as in the above example) are useless, almost as much so as the `Void` type.
cannam@62 243 They become interesting when used together with unions.
cannam@62 244
cannam@62 245 {% highlight capnp %}
cannam@62 246 struct Shape {
cannam@62 247 area @0 :Float64;
cannam@62 248
cannam@62 249 union {
cannam@62 250 circle :group {
cannam@62 251 radius @1 :Float64;
cannam@62 252 }
cannam@62 253 rectangle :group {
cannam@62 254 width @2 :Float64;
cannam@62 255 height @3 :Float64;
cannam@62 256 }
cannam@62 257 }
cannam@62 258 }
cannam@62 259 {% endhighlight %}
cannam@62 260
cannam@62 261 There are two main reason to use groups with unions:
cannam@62 262
cannam@62 263 1. They are often more self-documenting. Notice that `radius` is now a member of `circle`, so
cannam@62 264 we don't need a comment to explain that the value of `circle` is its radius.
cannam@62 265 2. You can add additional members later on, without breaking compatibility. Notice how we upgraded
cannam@62 266 `square` to `rectangle` above, adding a `height` field. This definition is actually
cannam@62 267 wire-compatible with the previous version of the `Shape` example from the "union" section
cannam@62 268 (aside from the fact that `height` will always be zero when reading old data -- hey, it's not
cannam@62 269 a perfect example). In real-world use, it is common to realize after the fact that you need to
cannam@62 270 add some information to a struct that only applies when one particular union field is set.
cannam@62 271 Without the ability to upgrade to a group, you would have to define the new field separately,
cannam@62 272 and have it waste space when not relevant.
cannam@62 273
cannam@62 274 Note that a named union is actually exactly equivalent to a named group containing an unnamed
cannam@62 275 union.
cannam@62 276
cannam@62 277 **Wait, weren't groups considered a misfeature in Protobufs? Why did you do this again?**
cannam@62 278
cannam@62 279 They are useful in unions, which Protobufs did not have. Meanwhile, you cannot have a "repeated
cannam@62 280 group" in Cap'n Proto, which was the case that got into the most trouble with Protobufs.
cannam@62 281
cannam@62 282 ### Dynamically-typed Fields
cannam@62 283
cannam@62 284 A struct may have a field with type `AnyPointer`. This field's value can be of any pointer type --
cannam@62 285 i.e. any struct, interface, list, or blob. This is essentially like a `void*` in C.
cannam@62 286
cannam@62 287 See also [generics](#generic-types).
cannam@62 288
cannam@62 289 ### Enums
cannam@62 290
cannam@62 291 An enum is a type with a small finite set of symbolic values.
cannam@62 292
cannam@62 293 {% highlight capnp %}
cannam@62 294 enum Rfc3092Variable {
cannam@62 295 foo @0;
cannam@62 296 bar @1;
cannam@62 297 baz @2;
cannam@62 298 qux @3;
cannam@62 299 # ...
cannam@62 300 }
cannam@62 301 {% endhighlight %}
cannam@62 302
cannam@62 303 Like fields, enumerants must be numbered sequentially starting from zero. In languages where
cannam@62 304 enums have numeric values, these numbers will be used, but in general Cap'n Proto enums should not
cannam@62 305 be considered numeric.
cannam@62 306
cannam@62 307 ### Interfaces
cannam@62 308
cannam@62 309 An interface has a collection of methods, each of which takes some parameters and return some
cannam@62 310 results. Like struct fields, methods are numbered. Interfaces support inheritance, including
cannam@62 311 multiple inheritance.
cannam@62 312
cannam@62 313 {% highlight capnp %}
cannam@62 314 interface Node {
cannam@62 315 isDirectory @0 () -> (result :Bool);
cannam@62 316 }
cannam@62 317
cannam@62 318 interface Directory extends(Node) {
cannam@62 319 list @0 () -> (list :List(Entry));
cannam@62 320 struct Entry {
cannam@62 321 name @0 :Text;
cannam@62 322 node @1 :Node;
cannam@62 323 }
cannam@62 324
cannam@62 325 create @1 (name :Text) -> (file :File);
cannam@62 326 mkdir @2 (name :Text) -> (directory :Directory);
cannam@62 327 open @3 (name :Text) -> (node :Node);
cannam@62 328 delete @4 (name :Text);
cannam@62 329 link @5 (name :Text, node :Node);
cannam@62 330 }
cannam@62 331
cannam@62 332 interface File extends(Node) {
cannam@62 333 size @0 () -> (size :UInt64);
cannam@62 334 read @1 (startAt :UInt64 = 0, amount :UInt64 = 0xffffffffffffffff)
cannam@62 335 -> (data :Data);
cannam@62 336 # Default params = read entire file.
cannam@62 337
cannam@62 338 write @2 (startAt :UInt64, data :Data);
cannam@62 339 truncate @3 (size :UInt64);
cannam@62 340 }
cannam@62 341 {% endhighlight %}
cannam@62 342
cannam@62 343 Notice something interesting here: `Node`, `Directory`, and `File` are interfaces, but several
cannam@62 344 methods take these types as parameters or return them as results. `Directory.Entry` is a struct,
cannam@62 345 but it contains a `Node`, which is an interface. Structs (and primitive types) are passed over RPC
cannam@62 346 by value, but interfaces are passed by reference. So when `Directory.list` is called remotely, the
cannam@62 347 content of a `List(Entry)` (including the text of each `name`) is transmitted back, but for the
cannam@62 348 `node` field, only a reference to some remote `Node` object is sent.
cannam@62 349
cannam@62 350 When an address of an object is transmitted, the RPC system automatically manages making sure that
cannam@62 351 the recipient gets permission to call the addressed object -- because if the recipient wasn't
cannam@62 352 meant to have access, the sender shouldn't have sent the reference in the first place. This makes
cannam@62 353 it very easy to develop secure protocols with Cap'n Proto -- you almost don't need to think about
cannam@62 354 access control at all. This feature is what makes Cap'n Proto a "capability-based" RPC system -- a
cannam@62 355 reference to an object inherently represents a "capability" to access it.
cannam@62 356
cannam@62 357 ### Generic Types
cannam@62 358
cannam@62 359 A struct or interface type may be parameterized, making it "generic". For example, this is useful
cannam@62 360 for defining type-safe containers:
cannam@62 361
cannam@62 362 {% highlight capnp %}
cannam@62 363 struct Map(Key, Value) {
cannam@62 364 entries @0 :List(Entry);
cannam@62 365 struct Entry {
cannam@62 366 key @0 :Key;
cannam@62 367 value @1 :Value;
cannam@62 368 }
cannam@62 369 }
cannam@62 370
cannam@62 371 struct People {
cannam@62 372 byName @0 :Map(Text, Person);
cannam@62 373 # Maps names to Person instances.
cannam@62 374 }
cannam@62 375 {% endhighlight %}
cannam@62 376
cannam@62 377 Cap'n Proto generics work very similarly to Java generics or C++ templates. Some notes:
cannam@62 378
cannam@62 379 * Only pointer types (structs, lists, blobs, and interfaces) can be used as generic parameters,
cannam@62 380 much like in Java. This is a pragmatic limitation: allowing parameters to have non-pointer types
cannam@62 381 would mean that different parameterizations of a struct could have completely different layouts,
cannam@62 382 which would excessively complicate the Cap'n Proto implementation.
cannam@62 383
cannam@62 384 * A type declaration nested inside a generic type may use the type parameters of the outer type,
cannam@62 385 as you can see in the example above. This differs from Java, but matches C++. If you want to
cannam@62 386 refer to a nested type from outside the outer type, you must specify the parameters on the outer
cannam@62 387 type, not the inner. For example, `Map(Text, Person).Entry` is a valid type;
cannam@62 388 `Map.Entry(Text, Person)` is NOT valid. (Of course, an inner type may declare additional generic
cannam@62 389 parameters.)
cannam@62 390
cannam@62 391 * If you refer to a generic type but omit its parameters (e.g. declare a field of type `Map` rather
cannam@62 392 than `Map(T, U)`), it is as if you specified `AnyPointer` for each parameter. Note that such
cannam@62 393 a type is wire-compatible with any specific parameterization, so long as you interpret the
cannam@62 394 `AnyPointer`s as the correct type at runtime.
cannam@62 395
cannam@62 396 * Relatedly, it is safe to cast an generic interface of a specific parameterization to a generic
cannam@62 397 interface where all parameters are `AnyPointer` and vice versa, as long as the `AnyPointer`s are
cannam@62 398 treated as the correct type at runtime. This means that e.g. you can implement a server in a
cannam@62 399 generic way that is correct for all parameterizations but call it from clients using a specific
cannam@62 400 parameterization.
cannam@62 401
cannam@62 402 * The encoding of a generic type is exactly the same as the encoding of a type produced by
cannam@62 403 substituting the type parameters manually. For example, `Map(Text, Person)` is encoded exactly
cannam@62 404 the same as:
cannam@62 405
cannam@62 406 <div>{% highlight capnp %}
cannam@62 407 struct PersonMap {
cannam@62 408 # Encoded the same as Map(Text, Person).
cannam@62 409 entries @0 :List(Entry);
cannam@62 410 struct Entry {
cannam@62 411 key @0 :Text;
cannam@62 412 value @1 :Person;
cannam@62 413 }
cannam@62 414 }
cannam@62 415 {% endhighlight %}
cannam@62 416 </div>
cannam@62 417
cannam@62 418 Therefore, it is possible to upgrade non-generic types to generic types while retaining
cannam@62 419 backwards-compatibility.
cannam@62 420
cannam@62 421 * Similarly, a generic interface's protocol is exactly the same as the interface obtained by
cannam@62 422 manually substituting the generic parameters.
cannam@62 423
cannam@62 424 ### Generic Methods
cannam@62 425
cannam@62 426 Interface methods may also have "implicit" generic parameters that apply to a particular method
cannam@62 427 call. This commonly applies to "factory" methods. For example:
cannam@62 428
cannam@62 429 {% highlight capnp %}
cannam@62 430 interface Assignable(T) {
cannam@62 431 # A generic interface, with non-generic methods.
cannam@62 432 get @0 () -> (value :T);
cannam@62 433 set @1 (value :T) -> ();
cannam@62 434 }
cannam@62 435
cannam@62 436 interface AssignableFactory {
cannam@62 437 newAssignable @0 [T] (initialValue :T)
cannam@62 438 -> (assignable :Assignable(T));
cannam@62 439 # A generic method.
cannam@62 440 }
cannam@62 441 {% endhighlight %}
cannam@62 442
cannam@62 443 Here, the method `newAssignable()` is generic. The return type of the method depends on the input
cannam@62 444 type.
cannam@62 445
cannam@62 446 Ideally, calls to a generic method should not have to explicitly specify the method's type
cannam@62 447 parameters, because they should be inferred from the types of the method's regular parameters.
cannam@62 448 However, this may not always be possible; it depends on the programming language and API details.
cannam@62 449
cannam@62 450 Note that if a method's generic parameter is used only in its returns, not its parameters, then
cannam@62 451 this implies that the returned value is appropriate for any parameterization. For example:
cannam@62 452
cannam@62 453 {% highlight capnp %}
cannam@62 454 newUnsetAssignable @1 [T] () -> (assignable :Assignable(T));
cannam@62 455 # Create a new assignable. `get()` on the returned object will
cannam@62 456 # throw an exception until `set()` has been called at least once.
cannam@62 457 {% endhighlight %}
cannam@62 458
cannam@62 459 Because of the way this method is designed, the returned `Assignable` is initially valid for any
cannam@62 460 `T`. Effectively, it doesn't take on a type until the first time `set()` is called, and then `T`
cannam@62 461 retroactively becomes the type of value passed to `set()`.
cannam@62 462
cannam@62 463 In contrast, if it's the case that the returned type is unknown, then you should NOT declare it
cannam@62 464 as generic. Instead, use `AnyPointer`, or omit a type's parameters (since they default to
cannam@62 465 `AnyPointer`). For example:
cannam@62 466
cannam@62 467 {% highlight capnp %}
cannam@62 468 getNamedAssignable @2 (name :Text) -> (assignable :Assignable);
cannam@62 469 # Get the `Assignable` with the given name. It is the
cannam@62 470 # responsibility of the caller to keep track of the type of each
cannam@62 471 # named `Assignable` and cast the returned object appropriately.
cannam@62 472 {% endhighlight %}
cannam@62 473
cannam@62 474 Here, we omitted the parameters to `Assignable` in the return type, because the returned object
cannam@62 475 has a specific type parameterization but it is not locally knowable.
cannam@62 476
cannam@62 477 ### Constants
cannam@62 478
cannam@62 479 You can define constants in Cap'n Proto. These don't affect what is sent on the wire, but they
cannam@62 480 will be included in the generated code, and can be [evaluated using the `capnp`
cannam@62 481 tool](capnp-tool.html#evaluating-constants).
cannam@62 482
cannam@62 483 {% highlight capnp %}
cannam@62 484 const pi :Float32 = 3.14159;
cannam@62 485 const bob :Person = (name = "Bob", email = "bob@example.com");
cannam@62 486 const secret :Data = 0x"9f98739c2b53835e 6720a00907abd42f";
cannam@62 487 {% endhighlight %}
cannam@62 488
cannam@62 489 Additionally, you may refer to a constant inside another value (e.g. another constant, or a default
cannam@62 490 value of a field).
cannam@62 491
cannam@62 492 {% highlight capnp %}
cannam@62 493 const foo :Int32 = 123;
cannam@62 494 const bar :Text = "Hello";
cannam@62 495 const baz :SomeStruct = (id = .foo, message = .bar);
cannam@62 496 {% endhighlight %}
cannam@62 497
cannam@62 498 Note that when substituting a constant into another value, the constant's name must be qualified
cannam@62 499 with its scope. E.g. if a constant `qux` is declared nested in a type `Corge`, it would need to
cannam@62 500 be referenced as `Corge.qux` rather than just `qux`, even when used within the `Corge` scope.
cannam@62 501 Constants declared at the top-level scope are prefixed just with `.`. This rule helps to make it
cannam@62 502 clear that the name refers to a user-defined constant, rather than a literal value (like `true` or
cannam@62 503 `inf`) or an enum value.
cannam@62 504
cannam@62 505 ### Nesting, Scope, and Aliases
cannam@62 506
cannam@62 507 You can nest constant, alias, and type definitions inside structs and interfaces (but not enums).
cannam@62 508 This has no effect on any definition involved except to define the scope of its name. So in Java
cannam@62 509 terms, inner classes are always "static". To name a nested type from another scope, separate the
cannam@62 510 path with `.`s.
cannam@62 511
cannam@62 512 {% highlight capnp %}
cannam@62 513 struct Foo {
cannam@62 514 struct Bar {
cannam@62 515 #...
cannam@62 516 }
cannam@62 517 bar @0 :Bar;
cannam@62 518 }
cannam@62 519
cannam@62 520 struct Baz {
cannam@62 521 bar @0 :Foo.Bar;
cannam@62 522 }
cannam@62 523 {% endhighlight %}
cannam@62 524
cannam@62 525 If typing long scopes becomes cumbersome, you can use `using` to declare an alias.
cannam@62 526
cannam@62 527 {% highlight capnp %}
cannam@62 528 struct Qux {
cannam@62 529 using Foo.Bar;
cannam@62 530 bar @0 :Bar;
cannam@62 531 }
cannam@62 532
cannam@62 533 struct Corge {
cannam@62 534 using T = Foo.Bar;
cannam@62 535 bar @0 :T;
cannam@62 536 }
cannam@62 537 {% endhighlight %}
cannam@62 538
cannam@62 539 ### Imports
cannam@62 540
cannam@62 541 An `import` expression names the scope of some other file:
cannam@62 542
cannam@62 543 {% highlight capnp %}
cannam@62 544 struct Foo {
cannam@62 545 # Use type "Baz" defined in bar.capnp.
cannam@62 546 baz @0 :import "bar.capnp".Baz;
cannam@62 547 }
cannam@62 548 {% endhighlight %}
cannam@62 549
cannam@62 550 Of course, typically it's more readable to define an alias:
cannam@62 551
cannam@62 552 {% highlight capnp %}
cannam@62 553 using Bar = import "bar.capnp";
cannam@62 554
cannam@62 555 struct Foo {
cannam@62 556 # Use type "Baz" defined in bar.capnp.
cannam@62 557 baz @0 :Bar.Baz;
cannam@62 558 }
cannam@62 559 {% endhighlight %}
cannam@62 560
cannam@62 561 Or even:
cannam@62 562
cannam@62 563 {% highlight capnp %}
cannam@62 564 using import "bar.capnp".Baz;
cannam@62 565
cannam@62 566 struct Foo {
cannam@62 567 baz @0 :Baz;
cannam@62 568 }
cannam@62 569 {% endhighlight %}
cannam@62 570
cannam@62 571 The above imports specify relative paths. If the path begins with a `/`, it is absolute -- in
cannam@62 572 this case, the `capnp` tool searches for the file in each of the search path directories specified
cannam@62 573 with `-I`.
cannam@62 574
cannam@62 575 ### Annotations
cannam@62 576
cannam@62 577 Sometimes you want to attach extra information to parts of your protocol that isn't part of the
cannam@62 578 Cap'n Proto language. This information might control details of a particular code generator, or
cannam@62 579 you might even read it at run time to assist in some kind of dynamic message processing. For
cannam@62 580 example, you might create a field annotation which means "hide from the public", and when you send
cannam@62 581 a message to an external user, you might invoke some code first that iterates over your message and
cannam@62 582 removes all of these hidden fields.
cannam@62 583
cannam@62 584 You may declare annotations and use them like so:
cannam@62 585
cannam@62 586 {% highlight capnp %}
cannam@62 587 # Declare an annotation 'foo' which applies to struct and enum types.
cannam@62 588 annotation foo(struct, enum) :Text;
cannam@62 589
cannam@62 590 # Apply 'foo' to to MyType.
cannam@62 591 struct MyType $foo("bar") {
cannam@62 592 # ...
cannam@62 593 }
cannam@62 594 {% endhighlight %}
cannam@62 595
cannam@62 596 The possible targets for an annotation are: `file`, `struct`, `field`, `union`, `enum`, `enumerant`,
cannam@62 597 `interface`, `method`, `parameter`, `annotation`, `const`. You may also specify `*` to cover them
cannam@62 598 all.
cannam@62 599
cannam@62 600 {% highlight capnp %}
cannam@62 601 # 'baz' can annotate anything!
cannam@62 602 annotation baz(*) :Int32;
cannam@62 603
cannam@62 604 $baz(1); # Annotate the file.
cannam@62 605
cannam@62 606 struct MyStruct $baz(2) {
cannam@62 607 myField @0 :Text = "default" $baz(3);
cannam@62 608 myUnion :union $baz(4) {
cannam@62 609 # ...
cannam@62 610 }
cannam@62 611 }
cannam@62 612
cannam@62 613 enum MyEnum $baz(5) {
cannam@62 614 myEnumerant @0 $baz(6);
cannam@62 615 }
cannam@62 616
cannam@62 617 interface MyInterface $baz(7) {
cannam@62 618 myMethod @0 (myParam :Text $baz(9)) -> () $baz(8);
cannam@62 619 }
cannam@62 620
cannam@62 621 annotation myAnnotation(struct) :Int32 $baz(10);
cannam@62 622 const myConst :Int32 = 123 $baz(11);
cannam@62 623 {% endhighlight %}
cannam@62 624
cannam@62 625 `Void` annotations can omit the value. Struct-typed annotations are also allowed. Tip: If
cannam@62 626 you want an annotation to have a default value, declare it as a struct with a single field with
cannam@62 627 a default value.
cannam@62 628
cannam@62 629 {% highlight capnp %}
cannam@62 630 annotation qux(struct, field) :Void;
cannam@62 631
cannam@62 632 struct MyStruct $qux {
cannam@62 633 string @0 :Text $qux;
cannam@62 634 number @1 :Int32 $qux;
cannam@62 635 }
cannam@62 636
cannam@62 637 annotation corge(file) :MyStruct;
cannam@62 638
cannam@62 639 $corge(string = "hello", number = 123);
cannam@62 640
cannam@62 641 struct Grault {
cannam@62 642 value @0 :Int32 = 123;
cannam@62 643 }
cannam@62 644
cannam@62 645 annotation grault(file) :Grault;
cannam@62 646
cannam@62 647 $grault(); # value defaults to 123
cannam@62 648 $grault(value = 456);
cannam@62 649 {% endhighlight %}
cannam@62 650
cannam@62 651 ### Unique IDs
cannam@62 652
cannam@62 653 A Cap'n Proto file must have a unique 64-bit ID, and each type and annotation defined therein may
cannam@62 654 also have an ID. Use `capnp id` to generate a new ID randomly. ID specifications begin with `@`:
cannam@62 655
cannam@62 656 {% highlight capnp %}
cannam@62 657 # file ID
cannam@62 658 @0xdbb9ad1f14bf0b36;
cannam@62 659
cannam@62 660 struct Foo @0x8db435604d0d3723 {
cannam@62 661 # ...
cannam@62 662 }
cannam@62 663
cannam@62 664 enum Bar @0xb400f69b5334aab3 {
cannam@62 665 # ...
cannam@62 666 }
cannam@62 667
cannam@62 668 interface Baz @0xf7141baba3c12691 {
cannam@62 669 # ...
cannam@62 670 }
cannam@62 671
cannam@62 672 annotation qux @0xf8a1bedf44c89f00 (field) :Text;
cannam@62 673 {% endhighlight %}
cannam@62 674
cannam@62 675 If you omit the ID for a type or annotation, one will be assigned automatically. This default
cannam@62 676 ID is derived by taking the first 8 bytes of the MD5 hash of the parent scope's ID concatenated
cannam@62 677 with the declaration's name (where the "parent scope" is the file for top-level declarations, or
cannam@62 678 the outer type for nested declarations). You can see the automatically-generated IDs by "compiling"
cannam@62 679 your file with the `-ocapnp` flag, which echos the schema back to the terminal annotated with
cannam@62 680 extra information, e.g. `capnp compile -ocapnp myschema.capnp`. In general, you would only specify
cannam@62 681 an explicit ID for a declaration if that declaration has been renamed or moved and you want the ID
cannam@62 682 to stay the same for backwards-compatibility.
cannam@62 683
cannam@62 684 IDs exist to provide a relatively short yet unambiguous way to refer to a type or annotation from
cannam@62 685 another context. They may be used for representing schemas, for tagging dynamically-typed fields,
cannam@62 686 etc. Most languages prefer instead to define a symbolic global namespace e.g. full of "packages",
cannam@62 687 but this would have some important disadvantages in the context of Cap'n Proto:
cannam@62 688
cannam@62 689 * Programmers often feel the need to change symbolic names and organization in order to make their
cannam@62 690 code cleaner, but the renamed code should still work with existing encoded data.
cannam@62 691 * It's easy for symbolic names to collide, and these collisions could be hard to detect in a large
cannam@62 692 distributed system with many different binaries using different versions of protocols.
cannam@62 693 * Fully-qualified type names may be large and waste space when transmitted on the wire.
cannam@62 694
cannam@62 695 Note that IDs are 64-bit (actually, 63-bit, as the first bit is always 1). Random collisions
cannam@62 696 are possible, but unlikely -- there would have to be on the order of a billion types before this
cannam@62 697 becomes a real concern. Collisions from misuse (e.g. copying an example without changing the ID)
cannam@62 698 are much more likely.
cannam@62 699
cannam@62 700 ## Evolving Your Protocol
cannam@62 701
cannam@62 702 A protocol can be changed in the following ways without breaking backwards-compatibility, and
cannam@62 703 without changing the [canonical](encoding.html#canonicalization) encoding of a message:
cannam@62 704
cannam@62 705 * New types, constants, and aliases can be added anywhere, since they obviously don't affect the
cannam@62 706 encoding of any existing type.
cannam@62 707
cannam@62 708 * New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
cannam@62 709 as long as each new member's number is larger than all previous members. Similarly, new fields
cannam@62 710 may be added to existing groups and unions.
cannam@62 711
cannam@62 712 * New parameters may be added to a method. The new parameters must be added to the end of the
cannam@62 713 parameter list and must have default values.
cannam@62 714
cannam@62 715 * Members can be re-arranged in the source code, so long as their numbers stay the same.
cannam@62 716
cannam@62 717 * Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same. Note
cannam@62 718 that type declarations have an implicit ID generated based on their name and parent's ID, but
cannam@62 719 you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
cannam@62 720 declare it explicitly after your rename.
cannam@62 721
cannam@62 722 * Type definitions can be moved to different scopes, as long as the type ID is declared
cannam@62 723 explicitly.
cannam@62 724
cannam@62 725 * A field can be moved into a group or a union, as long as the group/union and all other fields
cannam@62 726 within it are new. In other words, a field can be replaced with a group or union containing an
cannam@62 727 equivalent field and some new fields.
cannam@62 728
cannam@62 729 * A non-generic type can be made [generic](#generic-types), and new generic parameters may be
cannam@62 730 added to an existing generic type. Other types used inside the body of the newly-generic type can
cannam@62 731 be replaced with the new generic parameter so long as all existing users of the type are updated
cannam@62 732 to bind that generic parameter to the type it replaced. For example:
cannam@62 733
cannam@62 734 <div>{% highlight capnp %}
cannam@62 735 struct Map {
cannam@62 736 entries @0 :List(Entry);
cannam@62 737 struct Entry {
cannam@62 738 key @0 :Text;
cannam@62 739 value @1 :Text;
cannam@62 740 }
cannam@62 741 }
cannam@62 742 {% endhighlight %}
cannam@62 743 </div>
cannam@62 744
cannam@62 745 Can change to:
cannam@62 746
cannam@62 747 <div>{% highlight capnp %}
cannam@62 748 struct Map(Key, Value) {
cannam@62 749 entries @0 :List(Entry);
cannam@62 750 struct Entry {
cannam@62 751 key @0 :Key;
cannam@62 752 value @1 :Value;
cannam@62 753 }
cannam@62 754 }
cannam@62 755 {% endhighlight %}
cannam@62 756 </div>
cannam@62 757
cannam@62 758 As long as all existing uses of `Map` are replaced with `Map(Text, Text)` (and any uses of
cannam@62 759 `Map.Entry` are replaced with `Map(Text, Text).Entry`).
cannam@62 760
cannam@62 761 (This rule applies analogously to generic methods.)
cannam@62 762
cannam@62 763 The following changes are backwards-compatible but may change the canonical encoding of a message.
cannam@62 764 Apps that rely on canonicalization (such as some cryptographic protocols) should avoid changes in
cannam@62 765 this list, but most apps can safely use them:
cannam@62 766
cannam@62 767 * A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
cannam@62 768 `List(U)`, where `U` is a struct type whose `@0` field is of type `T`. This rule is useful when
cannam@62 769 you realize too late that you need to attach some extra data to each element of your list.
cannam@62 770 Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
cannam@62 771 As a special exception to this rule, `List(Bool)` may **not** be upgraded to a list of structs,
cannam@62 772 because implementing this for bit lists has proven unreasonably expensive.
cannam@62 773
cannam@62 774 Any change not listed above should be assumed NOT to be safe. In particular:
cannam@62 775
cannam@62 776 * You cannot change a field, method, or enumerant's number.
cannam@62 777 * You cannot change a field or method parameter's type or default value.
cannam@62 778 * You cannot change a type's ID.
cannam@62 779 * You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
cannam@62 780 generated based in part on the type name.
cannam@62 781 * You cannot move a type to a different scope or file unless it has an explicit ID, as the implicit
cannam@62 782 ID is based in part on the scope's ID.
cannam@62 783 * You cannot move an existing field into or out of an existing union, nor can you form a new union
cannam@62 784 containing more than one existing field.
cannam@62 785
cannam@62 786 Also, these rules only apply to the Cap'n Proto native encoding. It is sometimes useful to
cannam@62 787 transcode Cap'n Proto types to other formats, like JSON, which may have different rules (e.g.,
cannam@62 788 field names cannot change in JSON).