cannam@48
|
1 ---
|
cannam@48
|
2 layout: page
|
cannam@48
|
3 title: Schema Language
|
cannam@48
|
4 ---
|
cannam@48
|
5
|
cannam@48
|
6 # Schema Language
|
cannam@48
|
7
|
cannam@48
|
8 Like Protocol Buffers and Thrift (but unlike JSON or MessagePack), Cap'n Proto messages are
|
cannam@48
|
9 strongly-typed and not self-describing. You must define your message structure in a special
|
cannam@48
|
10 language, then invoke the Cap'n Proto compiler (`capnp compile`) to generate source code to
|
cannam@48
|
11 manipulate that message type in your desired language.
|
cannam@48
|
12
|
cannam@48
|
13 For example:
|
cannam@48
|
14
|
cannam@48
|
15 {% highlight capnp %}
|
cannam@48
|
16 @0xdbb9ad1f14bf0b36; # unique file ID, generated by `capnp id`
|
cannam@48
|
17
|
cannam@48
|
18 struct Person {
|
cannam@48
|
19 name @0 :Text;
|
cannam@48
|
20 birthdate @3 :Date;
|
cannam@48
|
21
|
cannam@48
|
22 email @1 :Text;
|
cannam@48
|
23 phones @2 :List(PhoneNumber);
|
cannam@48
|
24
|
cannam@48
|
25 struct PhoneNumber {
|
cannam@48
|
26 number @0 :Text;
|
cannam@48
|
27 type @1 :Type;
|
cannam@48
|
28
|
cannam@48
|
29 enum Type {
|
cannam@48
|
30 mobile @0;
|
cannam@48
|
31 home @1;
|
cannam@48
|
32 work @2;
|
cannam@48
|
33 }
|
cannam@48
|
34 }
|
cannam@48
|
35 }
|
cannam@48
|
36
|
cannam@48
|
37 struct Date {
|
cannam@48
|
38 year @0 :Int16;
|
cannam@48
|
39 month @1 :UInt8;
|
cannam@48
|
40 day @2 :UInt8;
|
cannam@48
|
41 }
|
cannam@48
|
42 {% endhighlight %}
|
cannam@48
|
43
|
cannam@48
|
44 Some notes:
|
cannam@48
|
45
|
cannam@48
|
46 * Types come after names. The name is by far the most important thing to see, especially when
|
cannam@48
|
47 quickly skimming, so we put it up front where it is most visible. Sorry, C got it wrong.
|
cannam@48
|
48 * The `@N` annotations show how the protocol evolved over time, so that the system can make sure
|
cannam@48
|
49 to maintain compatibility with older versions. Fields (and enumerants, and interface methods)
|
cannam@48
|
50 must be numbered consecutively starting from zero in the order in which they were added. In this
|
cannam@48
|
51 example, it looks like the `birthdate` field was added to the `Person` structure recently -- its
|
cannam@48
|
52 number is higher than the `email` and `phones` fields. Unlike Protobufs, you cannot skip numbers
|
cannam@48
|
53 when defining fields -- but there was never any reason to do so anyway.
|
cannam@48
|
54
|
cannam@48
|
55 ## Language Reference
|
cannam@48
|
56
|
cannam@48
|
57 ### Comments
|
cannam@48
|
58
|
cannam@48
|
59 Comments are indicated by hash signs and extend to the end of the line:
|
cannam@48
|
60
|
cannam@48
|
61 {% highlight capnp %}
|
cannam@48
|
62 # This is a comment.
|
cannam@48
|
63 {% endhighlight %}
|
cannam@48
|
64
|
cannam@48
|
65 Comments meant as documentation should appear _after_ the declaration, either on the same line, or
|
cannam@48
|
66 on a subsequent line. Doc comments for aggregate definitions should appear on the line after the
|
cannam@48
|
67 opening brace.
|
cannam@48
|
68
|
cannam@48
|
69 {% highlight capnp %}
|
cannam@48
|
70 struct Date {
|
cannam@48
|
71 # A standard Gregorian calendar date.
|
cannam@48
|
72
|
cannam@48
|
73 year @0 :Int16;
|
cannam@48
|
74 # The year. Must include the century.
|
cannam@48
|
75 # Negative value indicates BC.
|
cannam@48
|
76
|
cannam@48
|
77 month @1 :UInt8; # Month number, 1-12.
|
cannam@48
|
78 day @2 :UInt8; # Day number, 1-30.
|
cannam@48
|
79 }
|
cannam@48
|
80 {% endhighlight %}
|
cannam@48
|
81
|
cannam@48
|
82 Placing the comment _after_ the declaration rather than before makes the code more readable,
|
cannam@48
|
83 especially when doc comments grow long. You almost always need to see the declaration before you
|
cannam@48
|
84 can start reading the comment.
|
cannam@48
|
85
|
cannam@48
|
86 ### Built-in Types
|
cannam@48
|
87
|
cannam@48
|
88 The following types are automatically defined:
|
cannam@48
|
89
|
cannam@48
|
90 * **Void:** `Void`
|
cannam@48
|
91 * **Boolean:** `Bool`
|
cannam@48
|
92 * **Integers:** `Int8`, `Int16`, `Int32`, `Int64`
|
cannam@48
|
93 * **Unsigned integers:** `UInt8`, `UInt16`, `UInt32`, `UInt64`
|
cannam@48
|
94 * **Floating-point:** `Float32`, `Float64`
|
cannam@48
|
95 * **Blobs:** `Text`, `Data`
|
cannam@48
|
96 * **Lists:** `List(T)`
|
cannam@48
|
97
|
cannam@48
|
98 Notes:
|
cannam@48
|
99
|
cannam@48
|
100 * The `Void` type has exactly one possible value, and thus can be encoded in zero bits. It is
|
cannam@48
|
101 rarely used, but can be useful as a union member.
|
cannam@48
|
102 * `Text` is always UTF-8 encoded and NUL-terminated.
|
cannam@48
|
103 * `Data` is a completely arbitrary sequence of bytes.
|
cannam@48
|
104 * `List` is a parameterized type, where the parameter is the element type. For example,
|
cannam@48
|
105 `List(Int32)`, `List(Person)`, and `List(List(Text))` are all valid.
|
cannam@48
|
106
|
cannam@48
|
107 ### Structs
|
cannam@48
|
108
|
cannam@48
|
109 A struct has a set of named, typed fields, numbered consecutively starting from zero.
|
cannam@48
|
110
|
cannam@48
|
111 {% highlight capnp %}
|
cannam@48
|
112 struct Person {
|
cannam@48
|
113 name @0 :Text;
|
cannam@48
|
114 email @1 :Text;
|
cannam@48
|
115 }
|
cannam@48
|
116 {% endhighlight %}
|
cannam@48
|
117
|
cannam@48
|
118 Fields can have default values:
|
cannam@48
|
119
|
cannam@48
|
120 {% highlight capnp %}
|
cannam@48
|
121 foo @0 :Int32 = 123;
|
cannam@48
|
122 bar @1 :Text = "blah";
|
cannam@48
|
123 baz @2 :List(Bool) = [ true, false, false, true ];
|
cannam@48
|
124 qux @3 :Person = (name = "Bob", email = "bob@example.com");
|
cannam@48
|
125 corge @4 :Void = void;
|
cannam@48
|
126 grault @5 :Data = 0x"a1 40 33";
|
cannam@48
|
127 {% endhighlight %}
|
cannam@48
|
128
|
cannam@48
|
129 ### Unions
|
cannam@48
|
130
|
cannam@48
|
131 A union is two or more fields of a struct which are stored in the same location. Only one of
|
cannam@48
|
132 these fields can be set at a time, and a separate tag is maintained to track which one is
|
cannam@48
|
133 currently set. Unlike in C, unions are not types, they are simply properties of fields, therefore
|
cannam@48
|
134 union declarations do not look like types.
|
cannam@48
|
135
|
cannam@48
|
136 {% highlight capnp %}
|
cannam@48
|
137 struct Person {
|
cannam@48
|
138 # ...
|
cannam@48
|
139
|
cannam@48
|
140 employment :union {
|
cannam@48
|
141 unemployed @4 :Void;
|
cannam@48
|
142 employer @5 :Company;
|
cannam@48
|
143 school @6 :School;
|
cannam@48
|
144 selfEmployed @7 :Void;
|
cannam@48
|
145 # We assume that a person is only one of these.
|
cannam@48
|
146 }
|
cannam@48
|
147 }
|
cannam@48
|
148 {% endhighlight %}
|
cannam@48
|
149
|
cannam@48
|
150 Additionally, unions can be unnamed. Each struct can contain no more than one unnamed union. Use
|
cannam@48
|
151 unnamed unions in cases where you would struggle to think of an appropriate name for the union,
|
cannam@48
|
152 because the union represents the main body of the struct.
|
cannam@48
|
153
|
cannam@48
|
154 {% highlight capnp %}
|
cannam@48
|
155 struct Shape {
|
cannam@48
|
156 area @0 :Float64;
|
cannam@48
|
157
|
cannam@48
|
158 union {
|
cannam@48
|
159 circle @1 :Float64; # radius
|
cannam@48
|
160 square @2 :Float64; # width
|
cannam@48
|
161 }
|
cannam@48
|
162 }
|
cannam@48
|
163 {% endhighlight %}
|
cannam@48
|
164
|
cannam@48
|
165 Notes:
|
cannam@48
|
166
|
cannam@48
|
167 * Unions members are numbered in the same number space as fields of the containing struct.
|
cannam@48
|
168 Remember that the purpose of the numbers is to indicate the evolution order of the
|
cannam@48
|
169 struct. The system needs to know when the union fields were declared relative to the non-union
|
cannam@48
|
170 fields.
|
cannam@48
|
171
|
cannam@48
|
172 * Notice that we used the "useless" `Void` type here. We don't have any extra information to store
|
cannam@48
|
173 for the `unemployed` or `selfEmployed` cases, but we still want the union to distinguish these
|
cannam@48
|
174 states from others.
|
cannam@48
|
175
|
cannam@48
|
176 * By default, when a struct is initialized, the lowest-numbered field in the union is "set". If
|
cannam@48
|
177 you do not want any field set by default, simply declare a field called "unset" and make it the
|
cannam@48
|
178 lowest-numbered field.
|
cannam@48
|
179
|
cannam@48
|
180 * You can move an existing field into a new union without breaking compatibility with existing
|
cannam@48
|
181 data, as long as all of the other fields in the union are new. Since the existing field is
|
cannam@48
|
182 necessarily the lowest-numbered in the union, it will be the union's default field.
|
cannam@48
|
183
|
cannam@48
|
184 **Wait, why aren't unions first-class types?**
|
cannam@48
|
185
|
cannam@48
|
186 Requiring unions to be declared inside a struct, rather than living as free-standing types, has
|
cannam@48
|
187 some important advantages:
|
cannam@48
|
188
|
cannam@48
|
189 * If unions were first-class types, then union members would clearly have to be numbered separately
|
cannam@48
|
190 from the containing type's fields. This means that the compiler, when deciding how to position
|
cannam@48
|
191 the union in its containing struct, would have to conservatively assume that any kind of new
|
cannam@48
|
192 field might be added to the union in the future. To support this, all unions would have to
|
cannam@48
|
193 be allocated as separate objects embedded by pointer, wasting space.
|
cannam@48
|
194
|
cannam@48
|
195 * A free-standing union would be a liability for protocol evolution, because no additional data
|
cannam@48
|
196 can be attached to it later on. Consider, for example, a type which represents a parser token.
|
cannam@48
|
197 This type is naturally a union: it may be a keyword, identifier, numeric literal, quoted string,
|
cannam@48
|
198 etc. So the author defines it as a union, and the type is used widely. Later on, the developer
|
cannam@48
|
199 wants to attach information to the token indicating its line and column number in the source
|
cannam@48
|
200 file. Unfortunately, this is impossible without updating all users of the type, because the new
|
cannam@48
|
201 information ought to apply to _all_ token instances, not just specific members of the union. On
|
cannam@48
|
202 the other hand, if unions must be embedded within structs, it is always possible to add new
|
cannam@48
|
203 fields to the struct later on.
|
cannam@48
|
204
|
cannam@48
|
205 * When evolving a protocol it is common to discover that some existing field really should have
|
cannam@48
|
206 been enclosed in a union, because new fields being added are mutually exclusive with it. With
|
cannam@48
|
207 Cap'n Proto's unions, it is actually possible to "retroactively unionize" such a field without
|
cannam@48
|
208 changing its layout. This allows you to continue being able to read old data without wasting
|
cannam@48
|
209 space when writing new data. This is only possible when unions are declared within their
|
cannam@48
|
210 containing struct.
|
cannam@48
|
211
|
cannam@48
|
212 Cap'n Proto's unconventional approach to unions provides these advantages without any real down
|
cannam@48
|
213 side: where you would conventionally define a free-standing union type, in Cap'n Proto you
|
cannam@48
|
214 may simply define a struct type that contains only that union (probably unnamed), and you have
|
cannam@48
|
215 achieved the same effect. Thus, aside from being slightly unintuitive, it is strictly superior.
|
cannam@48
|
216
|
cannam@48
|
217 ### Groups
|
cannam@48
|
218
|
cannam@48
|
219 A group is a set of fields that are encapsulated in their own scope.
|
cannam@48
|
220
|
cannam@48
|
221 {% highlight capnp %}
|
cannam@48
|
222 struct Person {
|
cannam@48
|
223 # ...
|
cannam@48
|
224
|
cannam@48
|
225 # Note: This is a terrible way to use groups, and meant
|
cannam@48
|
226 # only to demonstrate the syntax.
|
cannam@48
|
227 address :group {
|
cannam@48
|
228 houseNumber @8 :UInt32;
|
cannam@48
|
229 street @9 :Text;
|
cannam@48
|
230 city @10 :Text;
|
cannam@48
|
231 country @11 :Text;
|
cannam@48
|
232 }
|
cannam@48
|
233 }
|
cannam@48
|
234 {% endhighlight %}
|
cannam@48
|
235
|
cannam@48
|
236 Interface-wise, the above group behaves as if you had defined a nested struct called `Address` and
|
cannam@48
|
237 then a field `address :Address`. However, a group is _not_ a separate object from its containing
|
cannam@48
|
238 struct: the fields are numbered in the same space as the containing struct's fields, and are laid
|
cannam@48
|
239 out exactly the same as if they hadn't been grouped at all. Essentially, a group is just a
|
cannam@48
|
240 namespace.
|
cannam@48
|
241
|
cannam@48
|
242 Groups on their own (as in the above example) are useless, almost as much so as the `Void` type.
|
cannam@48
|
243 They become interesting when used together with unions.
|
cannam@48
|
244
|
cannam@48
|
245 {% highlight capnp %}
|
cannam@48
|
246 struct Shape {
|
cannam@48
|
247 area @0 :Float64;
|
cannam@48
|
248
|
cannam@48
|
249 union {
|
cannam@48
|
250 circle :group {
|
cannam@48
|
251 radius @1 :Float64;
|
cannam@48
|
252 }
|
cannam@48
|
253 rectangle :group {
|
cannam@48
|
254 width @2 :Float64;
|
cannam@48
|
255 height @3 :Float64;
|
cannam@48
|
256 }
|
cannam@48
|
257 }
|
cannam@48
|
258 }
|
cannam@48
|
259 {% endhighlight %}
|
cannam@48
|
260
|
cannam@48
|
261 There are two main reason to use groups with unions:
|
cannam@48
|
262
|
cannam@48
|
263 1. They are often more self-documenting. Notice that `radius` is now a member of `circle`, so
|
cannam@48
|
264 we don't need a comment to explain that the value of `circle` is its radius.
|
cannam@48
|
265 2. You can add additional members later on, without breaking compatibility. Notice how we upgraded
|
cannam@48
|
266 `square` to `rectangle` above, adding a `height` field. This definition is actually
|
cannam@48
|
267 wire-compatible with the previous version of the `Shape` example from the "union" section
|
cannam@48
|
268 (aside from the fact that `height` will always be zero when reading old data -- hey, it's not
|
cannam@48
|
269 a perfect example). In real-world use, it is common to realize after the fact that you need to
|
cannam@48
|
270 add some information to a struct that only applies when one particular union field is set.
|
cannam@48
|
271 Without the ability to upgrade to a group, you would have to define the new field separately,
|
cannam@48
|
272 and have it waste space when not relevant.
|
cannam@48
|
273
|
cannam@48
|
274 Note that a named union is actually exactly equivalent to a named group containing an unnamed
|
cannam@48
|
275 union.
|
cannam@48
|
276
|
cannam@48
|
277 **Wait, weren't groups considered a misfeature in Protobufs? Why did you do this again?**
|
cannam@48
|
278
|
cannam@48
|
279 They are useful in unions, which Protobufs did not have. Meanwhile, you cannot have a "repeated
|
cannam@48
|
280 group" in Cap'n Proto, which was the case that got into the most trouble with Protobufs.
|
cannam@48
|
281
|
cannam@48
|
282 ### Dynamically-typed Fields
|
cannam@48
|
283
|
cannam@48
|
284 A struct may have a field with type `AnyPointer`. This field's value can be of any pointer type --
|
cannam@48
|
285 i.e. any struct, interface, list, or blob. This is essentially like a `void*` in C.
|
cannam@48
|
286
|
cannam@48
|
287 See also [generics](#generic-types).
|
cannam@48
|
288
|
cannam@48
|
289 ### Enums
|
cannam@48
|
290
|
cannam@48
|
291 An enum is a type with a small finite set of symbolic values.
|
cannam@48
|
292
|
cannam@48
|
293 {% highlight capnp %}
|
cannam@48
|
294 enum Rfc3092Variable {
|
cannam@48
|
295 foo @0;
|
cannam@48
|
296 bar @1;
|
cannam@48
|
297 baz @2;
|
cannam@48
|
298 qux @3;
|
cannam@48
|
299 # ...
|
cannam@48
|
300 }
|
cannam@48
|
301 {% endhighlight %}
|
cannam@48
|
302
|
cannam@48
|
303 Like fields, enumerants must be numbered sequentially starting from zero. In languages where
|
cannam@48
|
304 enums have numeric values, these numbers will be used, but in general Cap'n Proto enums should not
|
cannam@48
|
305 be considered numeric.
|
cannam@48
|
306
|
cannam@48
|
307 ### Interfaces
|
cannam@48
|
308
|
cannam@48
|
309 An interface has a collection of methods, each of which takes some parameters and return some
|
cannam@48
|
310 results. Like struct fields, methods are numbered. Interfaces support inheritance, including
|
cannam@48
|
311 multiple inheritance.
|
cannam@48
|
312
|
cannam@48
|
313 {% highlight capnp %}
|
cannam@48
|
314 interface Node {
|
cannam@48
|
315 isDirectory @0 () -> (result :Bool);
|
cannam@48
|
316 }
|
cannam@48
|
317
|
cannam@48
|
318 interface Directory extends(Node) {
|
cannam@48
|
319 list @0 () -> (list :List(Entry));
|
cannam@48
|
320 struct Entry {
|
cannam@48
|
321 name @0 :Text;
|
cannam@48
|
322 node @1 :Node;
|
cannam@48
|
323 }
|
cannam@48
|
324
|
cannam@48
|
325 create @1 (name :Text) -> (file :File);
|
cannam@48
|
326 mkdir @2 (name :Text) -> (directory :Directory);
|
cannam@48
|
327 open @3 (name :Text) -> (node :Node);
|
cannam@48
|
328 delete @4 (name :Text);
|
cannam@48
|
329 link @5 (name :Text, node :Node);
|
cannam@48
|
330 }
|
cannam@48
|
331
|
cannam@48
|
332 interface File extends(Node) {
|
cannam@48
|
333 size @0 () -> (size :UInt64);
|
cannam@48
|
334 read @1 (startAt :UInt64 = 0, amount :UInt64 = 0xffffffffffffffff)
|
cannam@48
|
335 -> (data :Data);
|
cannam@48
|
336 # Default params = read entire file.
|
cannam@48
|
337
|
cannam@48
|
338 write @2 (startAt :UInt64, data :Data);
|
cannam@48
|
339 truncate @3 (size :UInt64);
|
cannam@48
|
340 }
|
cannam@48
|
341 {% endhighlight %}
|
cannam@48
|
342
|
cannam@48
|
343 Notice something interesting here: `Node`, `Directory`, and `File` are interfaces, but several
|
cannam@48
|
344 methods take these types as parameters or return them as results. `Directory.Entry` is a struct,
|
cannam@48
|
345 but it contains a `Node`, which is an interface. Structs (and primitive types) are passed over RPC
|
cannam@48
|
346 by value, but interfaces are passed by reference. So when `Directory.list` is called remotely, the
|
cannam@48
|
347 content of a `List(Entry)` (including the text of each `name`) is transmitted back, but for the
|
cannam@48
|
348 `node` field, only a reference to some remote `Node` object is sent.
|
cannam@48
|
349
|
cannam@48
|
350 When an address of an object is transmitted, the RPC system automatically manages making sure that
|
cannam@48
|
351 the recipient gets permission to call the addressed object -- because if the recipient wasn't
|
cannam@48
|
352 meant to have access, the sender shouldn't have sent the reference in the first place. This makes
|
cannam@48
|
353 it very easy to develop secure protocols with Cap'n Proto -- you almost don't need to think about
|
cannam@48
|
354 access control at all. This feature is what makes Cap'n Proto a "capability-based" RPC system -- a
|
cannam@48
|
355 reference to an object inherently represents a "capability" to access it.
|
cannam@48
|
356
|
cannam@48
|
357 ### Generic Types
|
cannam@48
|
358
|
cannam@48
|
359 A struct or interface type may be parameterized, making it "generic". For example, this is useful
|
cannam@48
|
360 for defining type-safe containers:
|
cannam@48
|
361
|
cannam@48
|
362 {% highlight capnp %}
|
cannam@48
|
363 struct Map(Key, Value) {
|
cannam@48
|
364 entries @0 :List(Entry);
|
cannam@48
|
365 struct Entry {
|
cannam@48
|
366 key @0 :Key;
|
cannam@48
|
367 value @1 :Value;
|
cannam@48
|
368 }
|
cannam@48
|
369 }
|
cannam@48
|
370
|
cannam@48
|
371 struct People {
|
cannam@48
|
372 byName @0 :Map(Text, Person);
|
cannam@48
|
373 # Maps names to Person instances.
|
cannam@48
|
374 }
|
cannam@48
|
375 {% endhighlight %}
|
cannam@48
|
376
|
cannam@48
|
377 Cap'n Proto generics work very similarly to Java generics or C++ templates. Some notes:
|
cannam@48
|
378
|
cannam@48
|
379 * Only pointer types (structs, lists, blobs, and interfaces) can be used as generic parameters,
|
cannam@48
|
380 much like in Java. This is a pragmatic limitation: allowing parameters to have non-pointer types
|
cannam@48
|
381 would mean that different parameterizations of a struct could have completely different layouts,
|
cannam@48
|
382 which would excessively complicate the Cap'n Proto implementation.
|
cannam@48
|
383
|
cannam@48
|
384 * A type declaration nested inside a generic type may use the type parameters of the outer type,
|
cannam@48
|
385 as you can see in the example above. This differs from Java, but matches C++. If you want to
|
cannam@48
|
386 refer to a nested type from outside the outer type, you must specify the parameters on the outer
|
cannam@48
|
387 type, not the inner. For example, `Map(Text, Person).Entry` is a valid type;
|
cannam@48
|
388 `Map.Entry(Text, Person)` is NOT valid. (Of course, an inner type may declare additional generic
|
cannam@48
|
389 parameters.)
|
cannam@48
|
390
|
cannam@48
|
391 * If you refer to a generic type but omit its parameters (e.g. declare a field of type `Map` rather
|
cannam@48
|
392 than `Map(T, U)`), it is as if you specified `AnyPointer` for each parameter. Note that such
|
cannam@48
|
393 a type is wire-compatible with any specific parameterization, so long as you interpret the
|
cannam@48
|
394 `AnyPointer`s as the correct type at runtime.
|
cannam@48
|
395
|
cannam@48
|
396 * Relatedly, it is safe to cast an generic interface of a specific parameterization to a generic
|
cannam@48
|
397 interface where all parameters are `AnyPointer` and vice versa, as long as the `AnyPointer`s are
|
cannam@48
|
398 treated as the correct type at runtime. This means that e.g. you can implement a server in a
|
cannam@48
|
399 generic way that is correct for all parameterizations but call it from clients using a specific
|
cannam@48
|
400 parameterization.
|
cannam@48
|
401
|
cannam@48
|
402 * The encoding of a generic type is exactly the same as the encoding of a type produced by
|
cannam@48
|
403 substituting the type parameters manually. For example, `Map(Text, Person)` is encoded exactly
|
cannam@48
|
404 the same as:
|
cannam@48
|
405
|
cannam@48
|
406 <div>{% highlight capnp %}
|
cannam@48
|
407 struct PersonMap {
|
cannam@48
|
408 # Encoded the same as Map(Text, Person).
|
cannam@48
|
409 entries @0 :List(Entry);
|
cannam@48
|
410 struct Entry {
|
cannam@48
|
411 key @0 :Text;
|
cannam@48
|
412 value @1 :Person;
|
cannam@48
|
413 }
|
cannam@48
|
414 }
|
cannam@48
|
415 {% endhighlight %}
|
cannam@48
|
416 </div>
|
cannam@48
|
417
|
cannam@48
|
418 Therefore, it is possible to upgrade non-generic types to generic types while retaining
|
cannam@48
|
419 backwards-compatibility.
|
cannam@48
|
420
|
cannam@48
|
421 * Similarly, a generic interface's protocol is exactly the same as the interface obtained by
|
cannam@48
|
422 manually substituting the generic parameters.
|
cannam@48
|
423
|
cannam@48
|
424 ### Generic Methods
|
cannam@48
|
425
|
cannam@48
|
426 Interface methods may also have "implicit" generic parameters that apply to a particular method
|
cannam@48
|
427 call. This commonly applies to "factory" methods. For example:
|
cannam@48
|
428
|
cannam@48
|
429 {% highlight capnp %}
|
cannam@48
|
430 interface Assignable(T) {
|
cannam@48
|
431 # A generic interface, with non-generic methods.
|
cannam@48
|
432 get @0 () -> (value :T);
|
cannam@48
|
433 set @1 (value :T) -> ();
|
cannam@48
|
434 }
|
cannam@48
|
435
|
cannam@48
|
436 interface AssignableFactory {
|
cannam@48
|
437 newAssignable @0 [T] (initialValue :T)
|
cannam@48
|
438 -> (assignable :Assignable(T));
|
cannam@48
|
439 # A generic method.
|
cannam@48
|
440 }
|
cannam@48
|
441 {% endhighlight %}
|
cannam@48
|
442
|
cannam@48
|
443 Here, the method `newAssignable()` is generic. The return type of the method depends on the input
|
cannam@48
|
444 type.
|
cannam@48
|
445
|
cannam@48
|
446 Ideally, calls to a generic method should not have to explicitly specify the method's type
|
cannam@48
|
447 parameters, because they should be inferred from the types of the method's regular parameters.
|
cannam@48
|
448 However, this may not always be possible; it depends on the programming language and API details.
|
cannam@48
|
449
|
cannam@48
|
450 Note that if a method's generic parameter is used only in its returns, not its parameters, then
|
cannam@48
|
451 this implies that the returned value is appropriate for any parameterization. For example:
|
cannam@48
|
452
|
cannam@48
|
453 {% highlight capnp %}
|
cannam@48
|
454 newUnsetAssignable @1 [T] () -> (assignable :Assignable(T));
|
cannam@48
|
455 # Create a new assignable. `get()` on the returned object will
|
cannam@48
|
456 # throw an exception until `set()` has been called at least once.
|
cannam@48
|
457 {% endhighlight %}
|
cannam@48
|
458
|
cannam@48
|
459 Because of the way this method is designed, the returned `Assignable` is initially valid for any
|
cannam@48
|
460 `T`. Effectively, it doesn't take on a type until the first time `set()` is called, and then `T`
|
cannam@48
|
461 retroactively becomes the type of value passed to `set()`.
|
cannam@48
|
462
|
cannam@48
|
463 In contrast, if it's the case that the returned type is unknown, then you should NOT declare it
|
cannam@48
|
464 as generic. Instead, use `AnyPointer`, or omit a type's parameters (since they default to
|
cannam@48
|
465 `AnyPointer`). For example:
|
cannam@48
|
466
|
cannam@48
|
467 {% highlight capnp %}
|
cannam@48
|
468 getNamedAssignable @2 (name :Text) -> (assignable :Assignable);
|
cannam@48
|
469 # Get the `Assignable` with the given name. It is the
|
cannam@48
|
470 # responsibility of the caller to keep track of the type of each
|
cannam@48
|
471 # named `Assignable` and cast the returned object appropriately.
|
cannam@48
|
472 {% endhighlight %}
|
cannam@48
|
473
|
cannam@48
|
474 Here, we omitted the parameters to `Assignable` in the return type, because the returned object
|
cannam@48
|
475 has a specific type parameterization but it is not locally knowable.
|
cannam@48
|
476
|
cannam@48
|
477 ### Constants
|
cannam@48
|
478
|
cannam@48
|
479 You can define constants in Cap'n Proto. These don't affect what is sent on the wire, but they
|
cannam@48
|
480 will be included in the generated code, and can be [evaluated using the `capnp`
|
cannam@48
|
481 tool](capnp-tool.html#evaluating-constants).
|
cannam@48
|
482
|
cannam@48
|
483 {% highlight capnp %}
|
cannam@48
|
484 const pi :Float32 = 3.14159;
|
cannam@48
|
485 const bob :Person = (name = "Bob", email = "bob@example.com");
|
cannam@48
|
486 const secret :Data = 0x"9f98739c2b53835e 6720a00907abd42f";
|
cannam@48
|
487 {% endhighlight %}
|
cannam@48
|
488
|
cannam@48
|
489 Additionally, you may refer to a constant inside another value (e.g. another constant, or a default
|
cannam@48
|
490 value of a field).
|
cannam@48
|
491
|
cannam@48
|
492 {% highlight capnp %}
|
cannam@48
|
493 const foo :Int32 = 123;
|
cannam@48
|
494 const bar :Text = "Hello";
|
cannam@48
|
495 const baz :SomeStruct = (id = .foo, message = .bar);
|
cannam@48
|
496 {% endhighlight %}
|
cannam@48
|
497
|
cannam@48
|
498 Note that when substituting a constant into another value, the constant's name must be qualified
|
cannam@48
|
499 with its scope. E.g. if a constant `qux` is declared nested in a type `Corge`, it would need to
|
cannam@48
|
500 be referenced as `Corge.qux` rather than just `qux`, even when used within the `Corge` scope.
|
cannam@48
|
501 Constants declared at the top-level scope are prefixed just with `.`. This rule helps to make it
|
cannam@48
|
502 clear that the name refers to a user-defined constant, rather than a literal value (like `true` or
|
cannam@48
|
503 `inf`) or an enum value.
|
cannam@48
|
504
|
cannam@48
|
505 ### Nesting, Scope, and Aliases
|
cannam@48
|
506
|
cannam@48
|
507 You can nest constant, alias, and type definitions inside structs and interfaces (but not enums).
|
cannam@48
|
508 This has no effect on any definition involved except to define the scope of its name. So in Java
|
cannam@48
|
509 terms, inner classes are always "static". To name a nested type from another scope, separate the
|
cannam@48
|
510 path with `.`s.
|
cannam@48
|
511
|
cannam@48
|
512 {% highlight capnp %}
|
cannam@48
|
513 struct Foo {
|
cannam@48
|
514 struct Bar {
|
cannam@48
|
515 #...
|
cannam@48
|
516 }
|
cannam@48
|
517 bar @0 :Bar;
|
cannam@48
|
518 }
|
cannam@48
|
519
|
cannam@48
|
520 struct Baz {
|
cannam@48
|
521 bar @0 :Foo.Bar;
|
cannam@48
|
522 }
|
cannam@48
|
523 {% endhighlight %}
|
cannam@48
|
524
|
cannam@48
|
525 If typing long scopes becomes cumbersome, you can use `using` to declare an alias.
|
cannam@48
|
526
|
cannam@48
|
527 {% highlight capnp %}
|
cannam@48
|
528 struct Qux {
|
cannam@48
|
529 using Foo.Bar;
|
cannam@48
|
530 bar @0 :Bar;
|
cannam@48
|
531 }
|
cannam@48
|
532
|
cannam@48
|
533 struct Corge {
|
cannam@48
|
534 using T = Foo.Bar;
|
cannam@48
|
535 bar @0 :T;
|
cannam@48
|
536 }
|
cannam@48
|
537 {% endhighlight %}
|
cannam@48
|
538
|
cannam@48
|
539 ### Imports
|
cannam@48
|
540
|
cannam@48
|
541 An `import` expression names the scope of some other file:
|
cannam@48
|
542
|
cannam@48
|
543 {% highlight capnp %}
|
cannam@48
|
544 struct Foo {
|
cannam@48
|
545 # Use type "Baz" defined in bar.capnp.
|
cannam@48
|
546 baz @0 :import "bar.capnp".Baz;
|
cannam@48
|
547 }
|
cannam@48
|
548 {% endhighlight %}
|
cannam@48
|
549
|
cannam@48
|
550 Of course, typically it's more readable to define an alias:
|
cannam@48
|
551
|
cannam@48
|
552 {% highlight capnp %}
|
cannam@48
|
553 using Bar = import "bar.capnp";
|
cannam@48
|
554
|
cannam@48
|
555 struct Foo {
|
cannam@48
|
556 # Use type "Baz" defined in bar.capnp.
|
cannam@48
|
557 baz @0 :Bar.Baz;
|
cannam@48
|
558 }
|
cannam@48
|
559 {% endhighlight %}
|
cannam@48
|
560
|
cannam@48
|
561 Or even:
|
cannam@48
|
562
|
cannam@48
|
563 {% highlight capnp %}
|
cannam@48
|
564 using import "bar.capnp".Baz;
|
cannam@48
|
565
|
cannam@48
|
566 struct Foo {
|
cannam@48
|
567 baz @0 :Baz;
|
cannam@48
|
568 }
|
cannam@48
|
569 {% endhighlight %}
|
cannam@48
|
570
|
cannam@48
|
571 The above imports specify relative paths. If the path begins with a `/`, it is absolute -- in
|
cannam@48
|
572 this case, the `capnp` tool searches for the file in each of the search path directories specified
|
cannam@48
|
573 with `-I`.
|
cannam@48
|
574
|
cannam@48
|
575 ### Annotations
|
cannam@48
|
576
|
cannam@48
|
577 Sometimes you want to attach extra information to parts of your protocol that isn't part of the
|
cannam@48
|
578 Cap'n Proto language. This information might control details of a particular code generator, or
|
cannam@48
|
579 you might even read it at run time to assist in some kind of dynamic message processing. For
|
cannam@48
|
580 example, you might create a field annotation which means "hide from the public", and when you send
|
cannam@48
|
581 a message to an external user, you might invoke some code first that iterates over your message and
|
cannam@48
|
582 removes all of these hidden fields.
|
cannam@48
|
583
|
cannam@48
|
584 You may declare annotations and use them like so:
|
cannam@48
|
585
|
cannam@48
|
586 {% highlight capnp %}
|
cannam@48
|
587 # Declare an annotation 'foo' which applies to struct and enum types.
|
cannam@48
|
588 annotation foo(struct, enum) :Text;
|
cannam@48
|
589
|
cannam@48
|
590 # Apply 'foo' to to MyType.
|
cannam@48
|
591 struct MyType $foo("bar") {
|
cannam@48
|
592 # ...
|
cannam@48
|
593 }
|
cannam@48
|
594 {% endhighlight %}
|
cannam@48
|
595
|
cannam@48
|
596 The possible targets for an annotation are: `file`, `struct`, `field`, `union`, `enum`, `enumerant`,
|
cannam@48
|
597 `interface`, `method`, `parameter`, `annotation`, `const`. You may also specify `*` to cover them
|
cannam@48
|
598 all.
|
cannam@48
|
599
|
cannam@48
|
600 {% highlight capnp %}
|
cannam@48
|
601 # 'baz' can annotate anything!
|
cannam@48
|
602 annotation baz(*) :Int32;
|
cannam@48
|
603
|
cannam@48
|
604 $baz(1); # Annotate the file.
|
cannam@48
|
605
|
cannam@48
|
606 struct MyStruct $baz(2) {
|
cannam@48
|
607 myField @0 :Text = "default" $baz(3);
|
cannam@48
|
608 myUnion :union $baz(4) {
|
cannam@48
|
609 # ...
|
cannam@48
|
610 }
|
cannam@48
|
611 }
|
cannam@48
|
612
|
cannam@48
|
613 enum MyEnum $baz(5) {
|
cannam@48
|
614 myEnumerant @0 $baz(6);
|
cannam@48
|
615 }
|
cannam@48
|
616
|
cannam@48
|
617 interface MyInterface $baz(7) {
|
cannam@48
|
618 myMethod @0 (myParam :Text $baz(9)) -> () $baz(8);
|
cannam@48
|
619 }
|
cannam@48
|
620
|
cannam@48
|
621 annotation myAnnotation(struct) :Int32 $baz(10);
|
cannam@48
|
622 const myConst :Int32 = 123 $baz(11);
|
cannam@48
|
623 {% endhighlight %}
|
cannam@48
|
624
|
cannam@48
|
625 `Void` annotations can omit the value. Struct-typed annotations are also allowed. Tip: If
|
cannam@48
|
626 you want an annotation to have a default value, declare it as a struct with a single field with
|
cannam@48
|
627 a default value.
|
cannam@48
|
628
|
cannam@48
|
629 {% highlight capnp %}
|
cannam@48
|
630 annotation qux(struct, field) :Void;
|
cannam@48
|
631
|
cannam@48
|
632 struct MyStruct $qux {
|
cannam@48
|
633 string @0 :Text $qux;
|
cannam@48
|
634 number @1 :Int32 $qux;
|
cannam@48
|
635 }
|
cannam@48
|
636
|
cannam@48
|
637 annotation corge(file) :MyStruct;
|
cannam@48
|
638
|
cannam@48
|
639 $corge(string = "hello", number = 123);
|
cannam@48
|
640
|
cannam@48
|
641 struct Grault {
|
cannam@48
|
642 value @0 :Int32 = 123;
|
cannam@48
|
643 }
|
cannam@48
|
644
|
cannam@48
|
645 annotation grault(file) :Grault;
|
cannam@48
|
646
|
cannam@48
|
647 $grault(); # value defaults to 123
|
cannam@48
|
648 $grault(value = 456);
|
cannam@48
|
649 {% endhighlight %}
|
cannam@48
|
650
|
cannam@48
|
651 ### Unique IDs
|
cannam@48
|
652
|
cannam@48
|
653 A Cap'n Proto file must have a unique 64-bit ID, and each type and annotation defined therein may
|
cannam@48
|
654 also have an ID. Use `capnp id` to generate a new ID randomly. ID specifications begin with `@`:
|
cannam@48
|
655
|
cannam@48
|
656 {% highlight capnp %}
|
cannam@48
|
657 # file ID
|
cannam@48
|
658 @0xdbb9ad1f14bf0b36;
|
cannam@48
|
659
|
cannam@48
|
660 struct Foo @0x8db435604d0d3723 {
|
cannam@48
|
661 # ...
|
cannam@48
|
662 }
|
cannam@48
|
663
|
cannam@48
|
664 enum Bar @0xb400f69b5334aab3 {
|
cannam@48
|
665 # ...
|
cannam@48
|
666 }
|
cannam@48
|
667
|
cannam@48
|
668 interface Baz @0xf7141baba3c12691 {
|
cannam@48
|
669 # ...
|
cannam@48
|
670 }
|
cannam@48
|
671
|
cannam@48
|
672 annotation qux @0xf8a1bedf44c89f00 (field) :Text;
|
cannam@48
|
673 {% endhighlight %}
|
cannam@48
|
674
|
cannam@48
|
675 If you omit the ID for a type or annotation, one will be assigned automatically. This default
|
cannam@48
|
676 ID is derived by taking the first 8 bytes of the MD5 hash of the parent scope's ID concatenated
|
cannam@48
|
677 with the declaration's name (where the "parent scope" is the file for top-level declarations, or
|
cannam@48
|
678 the outer type for nested declarations). You can see the automatically-generated IDs by "compiling"
|
cannam@48
|
679 your file with the `-ocapnp` flag, which echos the schema back to the terminal annotated with
|
cannam@48
|
680 extra information, e.g. `capnp compile -ocapnp myschema.capnp`. In general, you would only specify
|
cannam@48
|
681 an explicit ID for a declaration if that declaration has been renamed or moved and you want the ID
|
cannam@48
|
682 to stay the same for backwards-compatibility.
|
cannam@48
|
683
|
cannam@48
|
684 IDs exist to provide a relatively short yet unambiguous way to refer to a type or annotation from
|
cannam@48
|
685 another context. They may be used for representing schemas, for tagging dynamically-typed fields,
|
cannam@48
|
686 etc. Most languages prefer instead to define a symbolic global namespace e.g. full of "packages",
|
cannam@48
|
687 but this would have some important disadvantages in the context of Cap'n Proto:
|
cannam@48
|
688
|
cannam@48
|
689 * Programmers often feel the need to change symbolic names and organization in order to make their
|
cannam@48
|
690 code cleaner, but the renamed code should still work with existing encoded data.
|
cannam@48
|
691 * It's easy for symbolic names to collide, and these collisions could be hard to detect in a large
|
cannam@48
|
692 distributed system with many different binaries using different versions of protocols.
|
cannam@48
|
693 * Fully-qualified type names may be large and waste space when transmitted on the wire.
|
cannam@48
|
694
|
cannam@48
|
695 Note that IDs are 64-bit (actually, 63-bit, as the first bit is always 1). Random collisions
|
cannam@48
|
696 are possible, but unlikely -- there would have to be on the order of a billion types before this
|
cannam@48
|
697 becomes a real concern. Collisions from misuse (e.g. copying an example without changing the ID)
|
cannam@48
|
698 are much more likely.
|
cannam@48
|
699
|
cannam@48
|
700 ## Evolving Your Protocol
|
cannam@48
|
701
|
cannam@48
|
702 A protocol can be changed in the following ways without breaking backwards-compatibility, and
|
cannam@48
|
703 without changing the [canonical](encoding.html#canonicalization) encoding of a message:
|
cannam@48
|
704
|
cannam@48
|
705 * New types, constants, and aliases can be added anywhere, since they obviously don't affect the
|
cannam@48
|
706 encoding of any existing type.
|
cannam@48
|
707
|
cannam@48
|
708 * New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
|
cannam@48
|
709 as long as each new member's number is larger than all previous members. Similarly, new fields
|
cannam@48
|
710 may be added to existing groups and unions.
|
cannam@48
|
711
|
cannam@48
|
712 * New parameters may be added to a method. The new parameters must be added to the end of the
|
cannam@48
|
713 parameter list and must have default values.
|
cannam@48
|
714
|
cannam@48
|
715 * Members can be re-arranged in the source code, so long as their numbers stay the same.
|
cannam@48
|
716
|
cannam@48
|
717 * Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same. Note
|
cannam@48
|
718 that type declarations have an implicit ID generated based on their name and parent's ID, but
|
cannam@48
|
719 you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
|
cannam@48
|
720 declare it explicitly after your rename.
|
cannam@48
|
721
|
cannam@48
|
722 * Type definitions can be moved to different scopes, as long as the type ID is declared
|
cannam@48
|
723 explicitly.
|
cannam@48
|
724
|
cannam@48
|
725 * A field can be moved into a group or a union, as long as the group/union and all other fields
|
cannam@48
|
726 within it are new. In other words, a field can be replaced with a group or union containing an
|
cannam@48
|
727 equivalent field and some new fields.
|
cannam@48
|
728
|
cannam@48
|
729 * A non-generic type can be made [generic](#generic-types), and new generic parameters may be
|
cannam@48
|
730 added to an existing generic type. Other types used inside the body of the newly-generic type can
|
cannam@48
|
731 be replaced with the new generic parameter so long as all existing users of the type are updated
|
cannam@48
|
732 to bind that generic parameter to the type it replaced. For example:
|
cannam@48
|
733
|
cannam@48
|
734 <div>{% highlight capnp %}
|
cannam@48
|
735 struct Map {
|
cannam@48
|
736 entries @0 :List(Entry);
|
cannam@48
|
737 struct Entry {
|
cannam@48
|
738 key @0 :Text;
|
cannam@48
|
739 value @1 :Text;
|
cannam@48
|
740 }
|
cannam@48
|
741 }
|
cannam@48
|
742 {% endhighlight %}
|
cannam@48
|
743 </div>
|
cannam@48
|
744
|
cannam@48
|
745 Can change to:
|
cannam@48
|
746
|
cannam@48
|
747 <div>{% highlight capnp %}
|
cannam@48
|
748 struct Map(Key, Value) {
|
cannam@48
|
749 entries @0 :List(Entry);
|
cannam@48
|
750 struct Entry {
|
cannam@48
|
751 key @0 :Key;
|
cannam@48
|
752 value @1 :Value;
|
cannam@48
|
753 }
|
cannam@48
|
754 }
|
cannam@48
|
755 {% endhighlight %}
|
cannam@48
|
756 </div>
|
cannam@48
|
757
|
cannam@48
|
758 As long as all existing uses of `Map` are replaced with `Map(Text, Text)` (and any uses of
|
cannam@48
|
759 `Map.Entry` are replaced with `Map(Text, Text).Entry`).
|
cannam@48
|
760
|
cannam@48
|
761 (This rule applies analogously to generic methods.)
|
cannam@48
|
762
|
cannam@48
|
763 The following changes are backwards-compatible but may change the canonical encoding of a message.
|
cannam@48
|
764 Apps that rely on canonicalization (such as some cryptographic protocols) should avoid changes in
|
cannam@48
|
765 this list, but most apps can safely use them:
|
cannam@48
|
766
|
cannam@48
|
767 * A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
|
cannam@48
|
768 `List(U)`, where `U` is a struct type whose `@0` field is of type `T`. This rule is useful when
|
cannam@48
|
769 you realize too late that you need to attach some extra data to each element of your list.
|
cannam@48
|
770 Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
|
cannam@48
|
771 As a special exception to this rule, `List(Bool)` may **not** be upgraded to a list of structs,
|
cannam@48
|
772 because implementing this for bit lists has proven unreasonably expensive.
|
cannam@48
|
773
|
cannam@48
|
774 Any change not listed above should be assumed NOT to be safe. In particular:
|
cannam@48
|
775
|
cannam@48
|
776 * You cannot change a field, method, or enumerant's number.
|
cannam@48
|
777 * You cannot change a field or method parameter's type or default value.
|
cannam@48
|
778 * You cannot change a type's ID.
|
cannam@48
|
779 * You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
|
cannam@48
|
780 generated based in part on the type name.
|
cannam@48
|
781 * You cannot move a type to a different scope or file unless it has an explicit ID, as the implicit
|
cannam@48
|
782 ID is based in part on the scope's ID.
|
cannam@48
|
783 * You cannot move an existing field into or out of an existing union, nor can you form a new union
|
cannam@48
|
784 containing more than one existing field.
|
cannam@48
|
785
|
cannam@48
|
786 Also, these rules only apply to the Cap'n Proto native encoding. It is sometimes useful to
|
cannam@48
|
787 transcode Cap'n Proto types to other formats, like JSON, which may have different rules (e.g.,
|
cannam@48
|
788 field names cannot change in JSON).
|