Mercurial > hg > sv-dependency-builds
comparison src/capnproto-git-20161025/doc/cxx.md @ 48:9530b331f8c1
Add Cap'n Proto source
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Tue, 25 Oct 2016 11:17:01 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
47:d93140aac40b | 48:9530b331f8c1 |
---|---|
1 --- | |
2 layout: page | |
3 title: C++ Serialization | |
4 --- | |
5 | |
6 # C++ Serialization | |
7 | |
8 The Cap'n Proto C++ runtime implementation provides an easy-to-use interface for manipulating | |
9 messages backed by fast pointer arithmetic. This page discusses the serialization layer of | |
10 the runtime; see [C++ RPC](cxxrpc.html) for information about the RPC layer. | |
11 | |
12 ## Example Usage | |
13 | |
14 For the Cap'n Proto definition: | |
15 | |
16 {% highlight capnp %} | |
17 struct Person { | |
18 id @0 :UInt32; | |
19 name @1 :Text; | |
20 email @2 :Text; | |
21 phones @3 :List(PhoneNumber); | |
22 | |
23 struct PhoneNumber { | |
24 number @0 :Text; | |
25 type @1 :Type; | |
26 | |
27 enum Type { | |
28 mobile @0; | |
29 home @1; | |
30 work @2; | |
31 } | |
32 } | |
33 | |
34 employment :union { | |
35 unemployed @4 :Void; | |
36 employer @5 :Text; | |
37 school @6 :Text; | |
38 selfEmployed @7 :Void; | |
39 # We assume that a person is only one of these. | |
40 } | |
41 } | |
42 | |
43 struct AddressBook { | |
44 people @0 :List(Person); | |
45 } | |
46 {% endhighlight %} | |
47 | |
48 You might write code like: | |
49 | |
50 {% highlight c++ %} | |
51 #include "addressbook.capnp.h" | |
52 #include <capnp/message.h> | |
53 #include <capnp/serialize-packed.h> | |
54 #include <iostream> | |
55 | |
56 void writeAddressBook(int fd) { | |
57 ::capnp::MallocMessageBuilder message; | |
58 | |
59 AddressBook::Builder addressBook = message.initRoot<AddressBook>(); | |
60 ::capnp::List<Person>::Builder people = addressBook.initPeople(2); | |
61 | |
62 Person::Builder alice = people[0]; | |
63 alice.setId(123); | |
64 alice.setName("Alice"); | |
65 alice.setEmail("alice@example.com"); | |
66 // Type shown for explanation purposes; normally you'd use auto. | |
67 ::capnp::List<Person::PhoneNumber>::Builder alicePhones = | |
68 alice.initPhones(1); | |
69 alicePhones[0].setNumber("555-1212"); | |
70 alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE); | |
71 alice.getEmployment().setSchool("MIT"); | |
72 | |
73 Person::Builder bob = people[1]; | |
74 bob.setId(456); | |
75 bob.setName("Bob"); | |
76 bob.setEmail("bob@example.com"); | |
77 auto bobPhones = bob.initPhones(2); | |
78 bobPhones[0].setNumber("555-4567"); | |
79 bobPhones[0].setType(Person::PhoneNumber::Type::HOME); | |
80 bobPhones[1].setNumber("555-7654"); | |
81 bobPhones[1].setType(Person::PhoneNumber::Type::WORK); | |
82 bob.getEmployment().setUnemployed(); | |
83 | |
84 writePackedMessageToFd(fd, message); | |
85 } | |
86 | |
87 void printAddressBook(int fd) { | |
88 ::capnp::PackedFdMessageReader message(fd); | |
89 | |
90 AddressBook::Reader addressBook = message.getRoot<AddressBook>(); | |
91 | |
92 for (Person::Reader person : addressBook.getPeople()) { | |
93 std::cout << person.getName().cStr() << ": " | |
94 << person.getEmail().cStr() << std::endl; | |
95 for (Person::PhoneNumber::Reader phone: person.getPhones()) { | |
96 const char* typeName = "UNKNOWN"; | |
97 switch (phone.getType()) { | |
98 case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break; | |
99 case Person::PhoneNumber::Type::HOME: typeName = "home"; break; | |
100 case Person::PhoneNumber::Type::WORK: typeName = "work"; break; | |
101 } | |
102 std::cout << " " << typeName << " phone: " | |
103 << phone.getNumber().cStr() << std::endl; | |
104 } | |
105 Person::Employment::Reader employment = person.getEmployment(); | |
106 switch (employment.which()) { | |
107 case Person::Employment::UNEMPLOYED: | |
108 std::cout << " unemployed" << std::endl; | |
109 break; | |
110 case Person::Employment::EMPLOYER: | |
111 std::cout << " employer: " | |
112 << employment.getEmployer().cStr() << std::endl; | |
113 break; | |
114 case Person::Employment::SCHOOL: | |
115 std::cout << " student at: " | |
116 << employment.getSchool().cStr() << std::endl; | |
117 break; | |
118 case Person::Employment::SELF_EMPLOYED: | |
119 std::cout << " self-employed" << std::endl; | |
120 break; | |
121 } | |
122 } | |
123 } | |
124 {% endhighlight %} | |
125 | |
126 ## C++ Feature Usage: C++11, Exceptions | |
127 | |
128 This implementation makes use of C++11 features. If you are using GCC, you will need at least | |
129 version 4.7 to compile Cap'n Proto. If you are using Clang, you will need at least version 3.2. | |
130 These compilers required the flag `-std=c++11` to enable C++11 features -- your code which | |
131 `#include`s Cap'n Proto headers will need to be compiled with this flag. Other compilers have not | |
132 been tested at this time. | |
133 | |
134 This implementation prefers to handle errors using exceptions. Exceptions are only used in | |
135 circumstances that should never occur in normal operation. For example, exceptions are thrown | |
136 on assertion failures (indicating bugs in the code), network failures, and invalid input. | |
137 Exceptions thrown by Cap'n Proto are never part of the interface and never need to be caught in | |
138 correct usage. The purpose of throwing exceptions is to allow higher-level code a chance to | |
139 recover from unexpected circumstances without disrupting other work happening in the same process. | |
140 For example, a server that handles requests from multiple clients should, on exception, return an | |
141 error to the client that caused the exception and close that connection, but should continue | |
142 handling other connections normally. | |
143 | |
144 When Cap'n Proto code might throw an exception from a destructor, it first checks | |
145 `std::uncaught_exception()` to ensure that this is safe. If another exception is already active, | |
146 the new exception is assumed to be a side-effect of the main exception, and is either silently | |
147 swallowed or reported on a side channel. | |
148 | |
149 In recognition of the fact that some teams prefer not to use exceptions, and that even enabling | |
150 exceptions in the compiler introduces overhead, Cap'n Proto allows you to disable them entirely | |
151 by registering your own exception callback. The callback will be called in place of throwing an | |
152 exception. The callback may abort the process, and is required to do so in certain circumstances | |
153 (when a fatal bug is detected). If the callback returns normally, Cap'n Proto will attempt | |
154 to continue by inventing "safe" values. This will lead to garbage output, but at least the program | |
155 will not crash. Your exception callback should set some sort of a flag indicating that an error | |
156 occurred, and somewhere up the stack you should check for that flag and cancel the operation. | |
157 See the header `kj/exception.h` for details on how to register an exception callback. | |
158 | |
159 ## KJ Library | |
160 | |
161 Cap'n Proto is built on top of a basic utility library called KJ. The two were actually developed | |
162 together -- KJ is simply the stuff which is not specific to Cap'n Proto serialization, and may be | |
163 useful to others independently of Cap'n Proto. For now, the the two are distributed together. The | |
164 name "KJ" has no particular meaning; it was chosen to be short and easy-to-type. | |
165 | |
166 As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library. You may need | |
167 to explicitly link against libraries: `-lcapnp -lkj` | |
168 | |
169 ## Generating Code | |
170 | |
171 To generate C++ code from your `.capnp` [interface definition](language.html), run: | |
172 | |
173 capnp compile -oc++ myproto.capnp | |
174 | |
175 This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`. | |
176 | |
177 To use this code in your app, you must link against both `libcapnp` and `libkj`. If you use | |
178 `pkg-config`, Cap'n Proto provides the `capnp` module to simplify discovery of compiler and linker | |
179 flags. | |
180 | |
181 If you use [RPC](cxxrpc.html) (i.e., your schema defines [interfaces](language.html#interfaces)), | |
182 then you will additionally nead to link against `libcapnp-rpc` and `libkj-async`, or use the | |
183 `capnp-rpc` `pkg-config` module. | |
184 | |
185 ### Setting a Namespace | |
186 | |
187 You probably want your generated types to live in a C++ namespace. You will need to import | |
188 `/capnp/c++.capnp` and use the `namespace` annotation it defines: | |
189 | |
190 {% highlight capnp %} | |
191 using Cxx = import "/capnp/c++.capnp"; | |
192 $Cxx.namespace("foo::bar::baz"); | |
193 {% endhighlight %} | |
194 | |
195 Note that `capnp/c++.capnp` is installed in `$PREFIX/include` (`/usr/local/include` by default) | |
196 when you install the C++ runtime. The `capnp` tool automatically searches `/usr/include` and | |
197 `/usr/local/include` for imports that start with a `/`, so it should "just work". If you installed | |
198 somewhere else, you may need to add it to the search path with the `-I` flag to `capnp compile`, | |
199 which works much like the compiler flag of the same name. | |
200 | |
201 ## Types | |
202 | |
203 ### Primitive Types | |
204 | |
205 Primitive types map to the obvious C++ types: | |
206 | |
207 * `Bool` -> `bool` | |
208 * `IntNN` -> `intNN_t` | |
209 * `UIntNN` -> `uintNN_t` | |
210 * `Float32` -> `float` | |
211 * `Float64` -> `double` | |
212 * `Void` -> `::capnp::Void` (An empty struct; its only value is `::capnp::VOID`) | |
213 | |
214 ### Structs | |
215 | |
216 For each struct `Foo` in your interface, a C++ type named `Foo` generated. This type itself is | |
217 really just a namespace; it contains two important inner classes: `Reader` and `Builder`. | |
218 | |
219 `Reader` represents a read-only instance of `Foo` while `Builder` represents a writable instance | |
220 (usually, one that you are building). Both classes behave like pointers, in that you can pass them | |
221 by value and they do not own the underlying data that they operate on. In other words, | |
222 `Foo::Builder` is like a pointer to a `Foo` while `Foo::Reader` is like a const pointer to a `Foo`. | |
223 | |
224 For every field `bar` defined in `Foo`, `Foo::Reader` has a method `getBar()`. For primitive types, | |
225 `get` just returns the type, but for structs, lists, and blobs, it returns a `Reader` for the | |
226 type. | |
227 | |
228 {% highlight c++ %} | |
229 // Example Reader methods: | |
230 | |
231 // myPrimitiveField @0 :Int32; | |
232 int32_t getMyPrimitiveField(); | |
233 | |
234 // myTextField @1 :Text; | |
235 ::capnp::Text::Reader getMyTextField(); | |
236 // (Note that Text::Reader may be implicitly cast to const char* and | |
237 // std::string.) | |
238 | |
239 // myStructField @2 :MyStruct; | |
240 MyStruct::Reader getMyStructField(); | |
241 | |
242 // myListField @3 :List(Float64); | |
243 ::capnp::List<double> getMyListField(); | |
244 {% endhighlight %} | |
245 | |
246 `Foo::Builder`, meanwhile, has several methods for each field `bar`: | |
247 | |
248 * `getBar()`: For primitives, returns the value. For composites, returns a Builder for the | |
249 composite. If a composite field has not been initialized (i.e. this is the first time it has | |
250 been accessed), it will be initialized to a copy of the field's default value before returning. | |
251 * `setBar(x)`: For primitives, sets the value to x. For composites, sets the value to a deep copy | |
252 of x, which must be a Reader for the type. | |
253 * `initBar(n)`: Only for lists and blobs. Sets the field to a newly-allocated list or blob | |
254 of size n and returns a Builder for it. The elements of the list are initialized to their empty | |
255 state (zero for numbers, default values for structs). | |
256 * `initBar()`: Only for structs. Sets the field to a newly-allocated struct and returns a | |
257 Builder for it. Note that the newly-allocated struct is initialized to the default value for | |
258 the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it | |
259 has one). | |
260 * `hasBar()`: Only for pointer fields (e.g. structs, lists, blobs). Returns true if the pointer | |
261 has been initialized (non-null). (This method is also available on readers.) | |
262 * `adoptBar(x)`: Only for pointer fields. Adopts the orphaned object x, linking it into the field | |
263 `bar` without copying. See the section on orphans. | |
264 * `disownBar()`: Disowns the value pointed to by `bar`, setting the pointer to null and returning | |
265 its previous value as an orphan. See the section on orphans. | |
266 | |
267 {% highlight c++ %} | |
268 // Example Builder methods: | |
269 | |
270 // myPrimitiveField @0 :Int32; | |
271 int32_t getMyPrimitiveField(); | |
272 void setMyPrimitiveField(int32_t value); | |
273 | |
274 // myTextField @1 :Text; | |
275 ::capnp::Text::Builder getMyTextField(); | |
276 void setMyTextField(::capnp::Text::Reader value); | |
277 ::capnp::Text::Builder initMyTextField(size_t size); | |
278 // (Note that Text::Reader is implicitly constructable from const char* | |
279 // and std::string, and Text::Builder can be implicitly cast to | |
280 // these types.) | |
281 | |
282 // myStructField @2 :MyStruct; | |
283 MyStruct::Builder getMyStructField(); | |
284 void setMyStructField(MyStruct::Reader value); | |
285 MyStruct::Builder initMyStructField(); | |
286 | |
287 // myListField @3 :List(Float64); | |
288 ::capnp::List<double>::Builder getMyListField(); | |
289 void setMyListField(::capnp::List<double>::Reader value); | |
290 ::capnp::List<double>::Builder initMyListField(size_t size); | |
291 {% endhighlight %} | |
292 | |
293 ### Groups | |
294 | |
295 Groups look a lot like a combination of a nested type and a field of that type, except that you | |
296 cannot set, adopt, or disown a group -- you can only get and init it. | |
297 | |
298 ### Unions | |
299 | |
300 A named union (as opposed to an unnamed one) works just like a group, except with some additions: | |
301 | |
302 * For each field `foo`, the union reader and builder have a method `isFoo()` which returns true | |
303 if `foo` is the currently-set field in the union. | |
304 * The union reader and builder also have a method `which()` that returns an enum value indicating | |
305 which field is currently set. | |
306 * Calling the set, init, or adopt accessors for a field makes it the currently-set field. | |
307 * Calling the get or disown accessors on a field that isn't currently set will throw an | |
308 exception in debug mode or return garbage when `NDEBUG` is defined. | |
309 | |
310 Unnamed unions differ from named unions only in that the accessor methods from the union's members | |
311 are added directly to the containing type's reader and builder, rather than generating a nested | |
312 type. | |
313 | |
314 See the [example](#example-usage) at the top of the page for an example of unions. | |
315 | |
316 ### Lists | |
317 | |
318 Lists are represented by the type `capnp::List<T>`, where `T` is any of the primitive types, | |
319 any Cap'n Proto user-defined type, `capnp::Text`, `capnp::Data`, or `capnp::List<U>` | |
320 (to form a list of lists). | |
321 | |
322 The type `List<T>` itself is not instantiatable, but has two inner classes: `Reader` and `Builder`. | |
323 As with structs, these types behave like pointers to read-only and read-write data, respectively. | |
324 | |
325 Both `Reader` and `Builder` implement `size()`, `operator[]`, `begin()`, and `end()`, as good C++ | |
326 containers should. Note, though, that `operator[]` is read-only -- you cannot use it to assign | |
327 the element, because that would require returning a reference, which is impossible because the | |
328 underlying data may not be in your CPU's native format (e.g., wrong byte order). Instead, to | |
329 assign an element of a list, you must use `builder.set(index, value)`. | |
330 | |
331 For `List<Foo>` where `Foo` is a non-primitive type, the type returned by `operator[]` and | |
332 `iterator::operator*()` is `Foo::Reader` (for `List<Foo>::Reader`) or `Foo::Builder` | |
333 (for `List<Foo>::Builder`). The builder's `set` method takes a `Foo::Reader` as its second | |
334 parameter. | |
335 | |
336 For lists of lists or lists of blobs, the builder also has a method `init(index, size)` which sets | |
337 the element at the given index to a newly-allocated value with the given size and returns a builder | |
338 for it. Struct lists do not have an `init` method because all elements are initialized to empty | |
339 values when the list is created. | |
340 | |
341 ### Enums | |
342 | |
343 Cap'n Proto enums become C++11 "enum classes". That means they behave like any other enum, but | |
344 the enum's values are scoped within the type. E.g. for an enum `Foo` with value `bar`, you must | |
345 refer to the value as `Foo::BAR`. | |
346 | |
347 To match prevaling C++ style, an enum's value names are converted to UPPERCASE_WITH_UNDERSCORES | |
348 (whereas in the schema language you'd write them in camelCase). | |
349 | |
350 Keep in mind when writing `switch` blocks that an enum read off the wire may have a numeric | |
351 value that is not listed in its definition. This may be the case if the sender is using a newer | |
352 version of the protocol, or if the message is corrupt or malicious. In C++11, enums are allowed | |
353 to have any value that is within the range of their base type, which for Cap'n Proto enums is | |
354 `uint16_t`. | |
355 | |
356 ### Blobs (Text and Data) | |
357 | |
358 Blobs are manipulated using the classes `capnp::Text` and `capnp::Data`. These classes are, | |
359 again, just containers for inner classes `Reader` and `Builder`. These classes are iterable and | |
360 implement `size()` and `operator[]` methods. `Builder::operator[]` even returns a reference | |
361 (unlike with `List<T>`). `Text::Reader` additionally has a method `cStr()` which returns a | |
362 NUL-terminated `const char*`. | |
363 | |
364 As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying | |
365 type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format. This is | |
366 accomplished without actually `#include`ing `<string>`, since some clients do not want to rely | |
367 on this rather-bulky header. In fact, any class which defines a `.c_str()` method will be | |
368 implicitly convertible in this way. Unfortunately, this trick doesn't work on GCC 4.7. | |
369 | |
370 ### Interfaces | |
371 | |
372 [Interfaces (RPC) have their own page.](cxxrpc.html) | |
373 | |
374 ### Generics | |
375 | |
376 [Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose | |
377 name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types | |
378 are not, because they inherit the parameters from the outer type. Similarly, template parameters | |
379 should refer to outer types, not `Reader` or `Builder` types. | |
380 | |
381 For example, given: | |
382 | |
383 {% highlight capnp %} | |
384 struct Map(Key, Value) { | |
385 entries @0 :List(Entry); | |
386 struct Entry { | |
387 key @0 :Key; | |
388 value @1 :Value; | |
389 } | |
390 } | |
391 | |
392 struct People { | |
393 byName @0 :Map(Text, Person); | |
394 # Maps names to Person instances. | |
395 } | |
396 {% endhighlight %} | |
397 | |
398 You might write code like: | |
399 | |
400 {% highlight c++ %} | |
401 void processPeople(People::Reader people) { | |
402 Map<Text, Person>::Reader reader = people.getByName(); | |
403 capnp::List<Map<Text, Person>::Entry>::Reader entries = | |
404 reader.getEntries() | |
405 for (auto entry: entries) { | |
406 processPerson(entry); | |
407 } | |
408 } | |
409 {% endhighlight %} | |
410 | |
411 Note that all template parameters will be specified with a default value of `AnyPointer`. | |
412 Therefore, the type `Map<>` is equivalent to `Map<capnp::AnyPointer, capnp::AnyPointer>`. | |
413 | |
414 ### Constants | |
415 | |
416 Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style | |
417 (whereas in the schema language you’d write them in camelCase). Primitive constants are just | |
418 `constexpr` values. Pointer-type constants (e.g. structs, lists, and blobs) are represented | |
419 using a proxy object that can be converted to the relevant `Reader` type, either implicitly or | |
420 using the unary `*` or `->` operators. | |
421 | |
422 ## Messages and I/O | |
423 | |
424 To create a new message, you must start by creating a `capnp::MessageBuilder` | |
425 (`capnp/message.h`). This is an abstract type which you can implement yourself, but most users | |
426 will want to use `capnp::MallocMessageBuilder`. Once your message is constructed, write it to | |
427 a file descriptor with `capnp::writeMessageToFd(fd, builder)` (`capnp/serialize.h`) or | |
428 `capnp::writePackedMessageToFd(fd, builder)` (`capnp/serialize-packed.h`). | |
429 | |
430 To read a message, you must create a `capnp::MessageReader`, which is another abstract type. | |
431 Implementations are specific to the data source. You can use `capnp::StreamFdMessageReader` | |
432 (`capnp/serialize.h`) or `capnp::PackedFdMessageReader` (`capnp/serialize-packed.h`) | |
433 to read from file descriptors; both take the file descriptor as a constructor argument. | |
434 | |
435 Note that if your stream contains additional data after the message, `PackedFdMessageReader` may | |
436 accidentally read some of that data, since it does buffered I/O. To make this work correctly, you | |
437 will need to set up a multi-use buffered stream. Buffered I/O may also be a good idea with | |
438 `StreamFdMessageReader` and also when writing, for performance reasons. See `capnp/io.h` for | |
439 details. | |
440 | |
441 There is an [example](#example-usage) of all this at the beginning of this page. | |
442 | |
443 ### Using mmap | |
444 | |
445 Cap'n Proto can be used together with `mmap()` (or Win32's `MapViewOfFile()`) for extremely fast | |
446 reads, especially when you only need to use a subset of the data in the file. Currently, | |
447 Cap'n Proto is not well-suited for _writing_ via `mmap()`, only reading, but this is only because | |
448 we have not yet invented a mutable segment framing format -- the underlying design should | |
449 eventually work for both. | |
450 | |
451 To take advantage of `mmap()` at read time, write your file in regular serialized (but NOT packed) | |
452 format -- that is, use `writeMessageToFd()`, _not_ `writePackedMessageToFd()`. Now, `mmap()` in | |
453 the entire file, and then pass the mapped memory to the constructor of | |
454 `capnp::FlatArrayMessageReader` (defined in `capnp/serialize.h`). That's it. You can use the | |
455 reader just like a normal `StreamFdMessageReader`. The operating system will automatically page | |
456 in data from disk as you read it. | |
457 | |
458 `mmap()` works best when reading from flash media, or when the file is already hot in cache. | |
459 It works less well with slow rotating disks. Here, disk seeks make random access relatively | |
460 expensive. Also, if I/O throughput is your bottleneck, then the fact that mmaped data cannot | |
461 be packed or compressed may hurt you. However, it all depends on what fraction of the file you're | |
462 actually reading -- if you only pull one field out of one deeply-nested struct in a huge tree, it | |
463 may still be a win. The only way to know for sure is to do benchmarks! (But be careful to make | |
464 sure your benchmark is actually interacting with disk and not cache.) | |
465 | |
466 ## Dynamic Reflection | |
467 | |
468 Sometimes you want to write generic code that operates on arbitrary types, iterating over the | |
469 fields or looking them up by name. For example, you might want to write code that encodes | |
470 arbitrary Cap'n Proto types in JSON format. This requires something like "reflection", but C++ | |
471 does not offer reflection. Also, you might even want to operate on types that aren't compiled | |
472 into the binary at all, but only discovered at runtime. | |
473 | |
474 The C++ API supports inspecting schemas at runtime via the interface defined in | |
475 `capnp/schema.h`, and dynamically reading and writing instances of arbitrary types via | |
476 `capnp/dynamic.h`. Here's the example from the beginning of this file rewritten in terms | |
477 of the dynamic API: | |
478 | |
479 {% highlight c++ %} | |
480 #include "addressbook.capnp.h" | |
481 #include <capnp/message.h> | |
482 #include <capnp/serialize-packed.h> | |
483 #include <iostream> | |
484 #include <capnp/schema.h> | |
485 #include <capnp/dynamic.h> | |
486 | |
487 using ::capnp::DynamicValue; | |
488 using ::capnp::DynamicStruct; | |
489 using ::capnp::DynamicEnum; | |
490 using ::capnp::DynamicList; | |
491 using ::capnp::List; | |
492 using ::capnp::Schema; | |
493 using ::capnp::StructSchema; | |
494 using ::capnp::EnumSchema; | |
495 | |
496 using ::capnp::Void; | |
497 using ::capnp::Text; | |
498 using ::capnp::MallocMessageBuilder; | |
499 using ::capnp::PackedFdMessageReader; | |
500 | |
501 void dynamicWriteAddressBook(int fd, StructSchema schema) { | |
502 // Write a message using the dynamic API to set each | |
503 // field by text name. This isn't something you'd | |
504 // normally want to do; it's just for illustration. | |
505 | |
506 MallocMessageBuilder message; | |
507 | |
508 // Types shown for explanation purposes; normally you'd | |
509 // use auto. | |
510 DynamicStruct::Builder addressBook = | |
511 message.initRoot<DynamicStruct>(schema); | |
512 | |
513 DynamicList::Builder people = | |
514 addressBook.init("people", 2).as<DynamicList>(); | |
515 | |
516 DynamicStruct::Builder alice = | |
517 people[0].as<DynamicStruct>(); | |
518 alice.set("id", 123); | |
519 alice.set("name", "Alice"); | |
520 alice.set("email", "alice@example.com"); | |
521 auto alicePhones = alice.init("phones", 1).as<DynamicList>(); | |
522 auto phone0 = alicePhones[0].as<DynamicStruct>(); | |
523 phone0.set("number", "555-1212"); | |
524 phone0.set("type", "mobile"); | |
525 alice.get("employment").as<DynamicStruct>() | |
526 .set("school", "MIT"); | |
527 | |
528 auto bob = people[1].as<DynamicStruct>(); | |
529 bob.set("id", 456); | |
530 bob.set("name", "Bob"); | |
531 bob.set("email", "bob@example.com"); | |
532 | |
533 // Some magic: We can convert a dynamic sub-value back to | |
534 // the native type with as<T>()! | |
535 List<Person::PhoneNumber>::Builder bobPhones = | |
536 bob.init("phones", 2).as<List<Person::PhoneNumber>>(); | |
537 bobPhones[0].setNumber("555-4567"); | |
538 bobPhones[0].setType(Person::PhoneNumber::Type::HOME); | |
539 bobPhones[1].setNumber("555-7654"); | |
540 bobPhones[1].setType(Person::PhoneNumber::Type::WORK); | |
541 bob.get("employment").as<DynamicStruct>() | |
542 .set("unemployed", ::capnp::VOID); | |
543 | |
544 writePackedMessageToFd(fd, message); | |
545 } | |
546 | |
547 void dynamicPrintValue(DynamicValue::Reader value) { | |
548 // Print an arbitrary message via the dynamic API by | |
549 // iterating over the schema. Look at the handling | |
550 // of STRUCT in particular. | |
551 | |
552 switch (value.getType()) { | |
553 case DynamicValue::VOID: | |
554 std::cout << ""; | |
555 break; | |
556 case DynamicValue::BOOL: | |
557 std::cout << (value.as<bool>() ? "true" : "false"); | |
558 break; | |
559 case DynamicValue::INT: | |
560 std::cout << value.as<int64_t>(); | |
561 break; | |
562 case DynamicValue::UINT: | |
563 std::cout << value.as<uint64_t>(); | |
564 break; | |
565 case DynamicValue::FLOAT: | |
566 std::cout << value.as<double>(); | |
567 break; | |
568 case DynamicValue::TEXT: | |
569 std::cout << '\"' << value.as<Text>().cStr() << '\"'; | |
570 break; | |
571 case DynamicValue::LIST: { | |
572 std::cout << "["; | |
573 bool first = true; | |
574 for (auto element: value.as<DynamicList>()) { | |
575 if (first) { | |
576 first = false; | |
577 } else { | |
578 std::cout << ", "; | |
579 } | |
580 dynamicPrintValue(element); | |
581 } | |
582 std::cout << "]"; | |
583 break; | |
584 } | |
585 case DynamicValue::ENUM: { | |
586 auto enumValue = value.as<DynamicEnum>(); | |
587 KJ_IF_MAYBE(enumerant, enumValue.getEnumerant()) { | |
588 std::cout << | |
589 enumerant->getProto().getName().cStr(); | |
590 } else { | |
591 // Unknown enum value; output raw number. | |
592 std::cout << enumValue.getRaw(); | |
593 } | |
594 break; | |
595 } | |
596 case DynamicValue::STRUCT: { | |
597 std::cout << "("; | |
598 auto structValue = value.as<DynamicStruct>(); | |
599 bool first = true; | |
600 for (auto field: structValue.getSchema().getFields()) { | |
601 if (!structValue.has(field)) continue; | |
602 if (first) { | |
603 first = false; | |
604 } else { | |
605 std::cout << ", "; | |
606 } | |
607 std::cout << field.getProto().getName().cStr() | |
608 << " = "; | |
609 dynamicPrintValue(structValue.get(field)); | |
610 } | |
611 std::cout << ")"; | |
612 break; | |
613 } | |
614 default: | |
615 // There are other types, we aren't handling them. | |
616 std::cout << "?"; | |
617 break; | |
618 } | |
619 } | |
620 | |
621 void dynamicPrintMessage(int fd, StructSchema schema) { | |
622 PackedFdMessageReader message(fd); | |
623 dynamicPrintValue(message.getRoot<DynamicStruct>(schema)); | |
624 std::cout << std::endl; | |
625 } | |
626 {% endhighlight %} | |
627 | |
628 Notes about the dynamic API: | |
629 | |
630 * You can implicitly cast any compiled Cap'n Proto struct reader/builder type directly to | |
631 `DynamicStruct::Reader`/`DynamicStruct::Builder`. Similarly with `List<T>` and `DynamicList`, | |
632 and even enum types and `DynamicEnum`. Finally, all valid Cap'n Proto field types may be | |
633 implicitly converted to `DynamicValue`. | |
634 | |
635 * You can load schemas dynamically at runtime using `SchemaLoader` (`capnp/schema-loader.h`) and | |
636 use the Dynamic API to manipulate objects of these types. `MessageBuilder` and `MessageReader` | |
637 have methods for accessing the message root using a dynamic schema. | |
638 | |
639 * While `SchemaLoader` loads binary schemas, you can also parse directly from text using | |
640 `SchemaParser` (`capnp/schema-parser.h`). However, this requires linking against `libcapnpc` | |
641 (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient. If | |
642 you can arrange to use only binary schemas at runtime, you'll be better off. | |
643 | |
644 * Unlike with Protobufs, there is no "global registry" of compiled-in types. To get the schema | |
645 for a compiled-in type, use `capnp::Schema::from<MyType>()`. | |
646 | |
647 * Unlike with Protobufs, the overhead of supporting reflection is small. Generated `.capnp.c++` | |
648 files contain only some embedded const data structures describing the schema, no code at all, | |
649 and the runtime library support code is relatively small. Moreover, if you do not use the | |
650 dynamic API or the schema API, you do not even need to link their implementations into your | |
651 executable. | |
652 | |
653 * The dynamic API performs type checks at runtime. In case of error, it will throw an exception. | |
654 If you compile with `-fno-exceptions`, it will crash instead. Correct usage of the API should | |
655 never throw, but bugs happen. Enabling and catching exceptions will make your code more robust. | |
656 | |
657 * Loading user-provided schemas has security implications: it greatly increases the attack | |
658 surface of the Cap'n Proto library. In particular, it is easy for an attacker to trigger | |
659 exceptions. To protect yourself, you are strongly advised to enable exceptions and catch them. | |
660 | |
661 ## Orphans | |
662 | |
663 An "orphan" is a Cap'n Proto object that is disconnected from the message structure. That is, | |
664 it is not the root of a message, and there is no other Cap'n Proto object holding a pointer to it. | |
665 Thus, it has no parents. Orphans are an advanced feature that can help avoid copies and make it | |
666 easier to use Cap'n Proto objects as part of your application's internal state. Typical | |
667 applications probably won't use orphans. | |
668 | |
669 The class `capnp::Orphan<T>` (defined in `<capnp/orphan.h>`) represents a pointer to an orphaned | |
670 object of type `T`. `T` can be any struct type, `List<T>`, `Text`, or `Data`. E.g. | |
671 `capnp::Orphan<Person>` would be an orphaned `Person` structure. `Orphan<T>` is a move-only class, | |
672 similar to `std::unique_ptr<T>`. This prevents two different objects from adopting the same | |
673 orphan, which would result in an invalid message. | |
674 | |
675 An orphan can be "adopted" by another object to link it into the message structure. Conversely, | |
676 an object can "disown" one of its pointers, causing the pointed-to object to become an orphan. | |
677 Every pointer-typed field `foo` provides builder methods `adoptFoo()` and `disownFoo()` for these | |
678 purposes. Again, these methods use C++11 move semantics. To use them, you will need to be | |
679 familiar with `std::move()` (or the equivalent but shorter-named `kj::mv()`). | |
680 | |
681 Even though an orphan is unlinked from the message tree, it still resides inside memory allocated | |
682 for a particular message (i.e. a particular `MessageBuilder`). An orphan can only be adopted by | |
683 objects that live in the same message. To move objects between messages, you must perform a copy. | |
684 If the message is serialized while an `Orphan<T>` living within it still exists, the orphan's | |
685 content will be part of the serialized message, but the only way the receiver could find it is by | |
686 investigating the raw message; the Cap'n Proto API provides no way to detect or read it. | |
687 | |
688 To construct an orphan from scratch (without having some other object disown it), you need an | |
689 `Orphanage`, which is essentially an orphan factory associated with some message. You can get one | |
690 by calling the `MessageBuilder`'s `getOrphanage()` method, or by calling the static method | |
691 `Orphanage::getForMessageContaining(builder)` and passing it any struct or list builder. | |
692 | |
693 Note that when an `Orphan<T>` goes out-of-scope without being adopted, the underlying memory that | |
694 it occupied is overwritten with zeros. If you use packed serialization, these zeros will take very | |
695 little bandwidth on the wire, but will still waste memory on the sending and receiving ends. | |
696 Generally, you should avoid allocating message objects that won't be used, or if you cannot avoid | |
697 it, arrange to copy the entire message over to a new `MessageBuilder` before serializing, since | |
698 only the reachable objects will be copied. | |
699 | |
700 ## Reference | |
701 | |
702 The runtime library contains lots of useful features not described on this page. For now, the | |
703 best reference is the header files. See: | |
704 | |
705 capnp/list.h | |
706 capnp/blob.h | |
707 capnp/message.h | |
708 capnp/serialize.h | |
709 capnp/serialize-packed.h | |
710 capnp/schema.h | |
711 capnp/schema-loader.h | |
712 capnp/dynamic.h | |
713 | |
714 ## Tips and Best Practices | |
715 | |
716 Here are some tips for using the C++ Cap'n Proto runtime most effectively: | |
717 | |
718 * Accessor methods for primitive (non-pointer) fields are fast and inline. They should be just | |
719 as fast as accessing a struct field through a pointer. | |
720 | |
721 * Accessor methods for pointer fields, on the other hand, are not inline, as they need to validate | |
722 the pointer. If you intend to access the same pointer multiple times, it is a good idea to | |
723 save the value to a local variable to avoid repeating this work. This is generally not a | |
724 problem given C++11's `auto`. | |
725 | |
726 Example: | |
727 | |
728 // BAD | |
729 frob(foo.getBar().getBaz(), | |
730 foo.getBar().getQux(), | |
731 foo.getBar().getCorge()); | |
732 | |
733 // GOOD | |
734 auto bar = foo.getBar(); | |
735 frob(bar.getBaz(), bar.getQux(), bar.getCorge()); | |
736 | |
737 It is especially important to use this style when reading messages, for another reason: as | |
738 described under the "security tips" section, below, every time you `get` a pointer, Cap'n Proto | |
739 increments a counter by the size of the target object. If that counter hits a pre-defined limit, | |
740 an exception is thrown (or a default value is returned, if exceptions are disabled), to prevent | |
741 a malicious client from sending your server into an infinite loop with a specially-crafted | |
742 message. If you repeatedly `get` the same object, you are repeatedly counting the same bytes, | |
743 and so you may hit the limit prematurely. (Since Cap'n Proto readers are backed directly by | |
744 the underlying message buffer and do not have anywhere else to store per-object information, it | |
745 is impossible to remember whether you've seen a particular object already.) | |
746 | |
747 * Internally, all pointer fields start out "null", even if they have default values. When you have | |
748 a pointer field `foo` and you call `getFoo()` on the containing struct's `Reader`, if the field | |
749 is "null", you will receive a reader for that field's default value. This reader is backed by | |
750 read-only memory; nothing is allocated. However, when you call `get` on a _builder_, and the | |
751 field is null, then the implementation must make a _copy_ of the default value to return to you. | |
752 Thus, you've caused the field to become non-null, just by "reading" it. On the other hand, if | |
753 you call `init` on that field, you are explicitly replacing whatever value is already there | |
754 (null or not) with a newly-allocated instance, and that newly-allocated instance is _not_ a | |
755 copy of the field's default value, but just a completely-uninitialized instance of the | |
756 appropriate type. | |
757 | |
758 * It is possible to receive a struct value constructed from a newer version of the protocol than | |
759 the one your binary was built with, and that struct might have extra fields that you don't know | |
760 about. The Cap'n Proto implementation tries to avoid discarding this extra data. If you copy | |
761 the struct from one message to another (e.g. by calling a set() method on a parent object), the | |
762 extra fields will be preserved. This makes it possible to build proxies that receive messages | |
763 and forward them on without having to rebuild the proxy every time a new field is added. You | |
764 must be careful, however: in some cases, it's not possible to retain the extra fields, because | |
765 they need to be copied into a space that is allocated before the expected content is known. | |
766 In particular, lists of structs are represented as a flat array, not as an array of pointers. | |
767 Therefore, all memory for all structs in the list must be allocated upfront. Hence, copying | |
768 a struct value from another message into an element of a list will truncate the value. Because | |
769 of this, the setter method for struct lists is called `setWithCaveats()` rather than just `set()`. | |
770 | |
771 * Messages are built in "arena" or "region" style: each object is allocated sequentially in | |
772 memory, until there is no more room in the segment, in which case a new segment is allocated, | |
773 and objects continue to be allocated sequentially in that segment. This design is what makes | |
774 Cap'n Proto possible at all, and it is very fast compared to other allocation strategies. | |
775 However, it has the disadvantage that if you allocate an object and then discard it, that memory | |
776 is lost. In fact, the empty space will still become part of the serialized message, even though | |
777 it is unreachable. The implementation will try to zero it out, so at least it should pack well, | |
778 but it's still better to avoid this situation. Some ways that this can happen include: | |
779 * If you `init` a field that is already initialized, the previous value is discarded. | |
780 * If you create an orphan that is never adopted into the message tree. | |
781 * If you use `adoptWithCaveats` to adopt an orphaned struct into a struct list, then a shallow | |
782 copy is necessary, since the struct list requires that its elements are sequential in memory. | |
783 The previous copy of the struct is discarded (although child objects are transferred properly). | |
784 * If you copy a struct value from another message using a `set` method, the copy will have the | |
785 same size as the original. However, the original could have been built with an older version | |
786 of the protocol which lacked some fields compared to the version your program was built with. | |
787 If you subsequently `get` that struct, the implementation will be forced to allocate a new | |
788 (shallow) copy which is large enough to hold all known fields, and the old copy will be | |
789 discarded. Child objects will be transferred over without being copied -- though they might | |
790 suffer from the same problem if you `get` them later on. | |
791 Sometimes, avoiding these problems is too inconvenient. Fortunately, it's also possible to | |
792 clean up the mess after-the-fact: if you copy the whole message tree into a fresh | |
793 `MessageBuilder`, only the reachable objects will be copied, leaving out all of the unreachable | |
794 dead space. | |
795 | |
796 In the future, Cap'n Proto may be improved such that it can re-use dead space in a message. | |
797 However, this will only improve things, not fix them entirely: fragementation could still leave | |
798 dead space. | |
799 | |
800 ### Build Tips | |
801 | |
802 * If you are worried about the binary footprint of the Cap'n Proto library, consider statically | |
803 linking with the `--gc-sections` linker flag. This will allow the linker to drop pieces of the | |
804 library that you do not actually use. For example, many users do not use the dynamic schema and | |
805 reflection APIs, which contribute a large fraction of the Cap'n Proto library's overall | |
806 footprint. Keep in mind that if you ever stringify a Cap'n Proto type, the stringification code | |
807 depends on the dynamic API; consider only using stringification in debug builds. | |
808 | |
809 If you are dynamically linking against the system's shared copy of `libcapnp`, don't worry about | |
810 its binary size. Remember that only the code which you actually use will be paged into RAM, and | |
811 those pages are shared with other applications on the system. | |
812 | |
813 Also remember to strip your binary. In particular, `libcapnpc` (the schema parser) has | |
814 excessively large symbol names caused by its use of template-based parser combinators. Stripping | |
815 the binary greatly reduces its size. | |
816 | |
817 * The Cap'n Proto library has lots of debug-only asserts that are removed if you `#define NDEBUG`, | |
818 including in headers. If you care at all about performance, you should compile your production | |
819 binaries with the `-DNDEBUG` compiler flag. In fact, if Cap'n Proto detects that you have | |
820 optimization enabled but have not defined `NDEBUG`, it will define it for you (with a warning), | |
821 unless you define `DEBUG` or `KJ_DEBUG` to explicitly request debugging. | |
822 | |
823 ### Security Tips | |
824 | |
825 Cap'n Proto has not yet undergone security review. It most likely has some vulnerabilities. You | |
826 should not attempt to decode Cap'n Proto messages from sources you don't trust at this time. | |
827 | |
828 However, assuming the Cap'n Proto implementation hardens up eventually, then the following security | |
829 tips will apply. | |
830 | |
831 * It is highly recommended that you enable exceptions. When compiled with `-fno-exceptions`, | |
832 Cap'n Proto categorizes exceptions into "fatal" and "recoverable" varieties. Fatal exceptions | |
833 cause the server to crash, while recoverable exceptions are handled by logging an error and | |
834 returning a "safe" garbage value. Fatal is preferred in cases where it's unclear what kind of | |
835 garbage value would constitute "safe". The more of the library you use, the higher the chance | |
836 that you will leave yourself open to the possibility that an attacker could trigger a fatal | |
837 exception somewhere. If you enable exceptions, then you can catch the exception instead of | |
838 crashing, and return an error just to the attacker rather than to everyone using your server. | |
839 | |
840 Basic parsing of Cap'n Proto messages shouldn't ever trigger fatal exceptions (assuming the | |
841 implementation is not buggy). However, the dynamic API -- especially if you are loading schemas | |
842 controlled by the attacker -- is much more exception-happy. If you cannot use exceptions, then | |
843 you are advised to avoid the dynamic API when dealing with untrusted data. | |
844 | |
845 * If you need to process schemas from untrusted sources, take them in binary format, not text. | |
846 The text parser is a much larger attack surface and not designed to be secure. For instance, | |
847 as of this writing, it is trivial to deadlock the parser by simply writing a constant whose value | |
848 depends on itself. | |
849 | |
850 * Cap'n Proto automatically applies two artificial limits on messages for security reasons: | |
851 a limit on nesting dept, and a limit on total bytes traversed. | |
852 | |
853 * The nesting depth limit is designed to prevent stack overflow when handling a deeply-nested | |
854 recursive type, and defaults to 64. If your types aren't recursive, it is highly unlikely | |
855 that you would ever hit this limit, and even if they are recursive, it's still unlikely. | |
856 | |
857 * The traversal limit is designed to defend against maliciously-crafted messages which use | |
858 pointer cycles or overlapping objects to make a message appear much larger than it looks off | |
859 the wire. While cycles and overlapping objects are illegal, they are hard to detect reliably. | |
860 Instead, Cap'n Proto places a limit on how many bytes worth of objects you can _dereference_ | |
861 before it throws an exception. This limit is assessed every time you follow a pointer. By | |
862 default, the limit is 64MiB (this may change in the future). `StreamFdMessageReader` will | |
863 actually reject upfront any message which is larger than the traversal limit, even before you | |
864 start reading it. | |
865 | |
866 If you need to write your code in such a way that you might frequently re-read the same | |
867 pointers, instead of increasing the traversal limit to the point where it is no longer useful, | |
868 consider simply copying the message into a new `MallocMessageBuilder` before starting. Then, | |
869 the traversal limit will be enforced only during the copy. There is no traversal limit on | |
870 objects once they live in a `MessageBuilder`, even if you use `.asReader()` to convert a | |
871 particular object's builder to the corresponding reader type. | |
872 | |
873 Both limits may be increased using `capnp::ReaderOptions`, defined in `capnp/message.h`. | |
874 | |
875 * Remember that enums on the wire may have a numeric value that does not match any value defined | |
876 in the schema. Your `switch()` statements must always have a safe default case. | |
877 | |
878 ## Lessons Learned from Protocol Buffers | |
879 | |
880 The author of Cap'n Proto's C++ implementation also wrote (in the past) verison 2 of Google's | |
881 Protocol Buffers. As a result, Cap'n Proto's implementation benefits from a number of lessons | |
882 learned the hard way: | |
883 | |
884 * Protobuf generated code is enormous due to the parsing and serializing code generated for every | |
885 class. This actually poses a significant problem in practice -- there exist server binaries | |
886 containing literally hundreds of megabytes of compiled protobuf code. Cap'n Proto generated code, | |
887 on the other hand, is almost entirely inlined accessors. The only things that go into `.capnp.o` | |
888 files are default values for pointer fields (if needed, which is rare) and the encoded schema | |
889 (just the raw bytes of a Cap'n-Proto-encoded schema structure). The latter could even be removed | |
890 if you don't use dynamic reflection. | |
891 | |
892 * The C++ Protobuf implementation used lots of dynamic initialization code (that runs before | |
893 `main()`) to do things like register types in global tables. This proved problematic for | |
894 programs which linked in lots of protocols but needed to start up quickly. Cap'n Proto does not | |
895 use any dynamic initializers anywhere, period. | |
896 | |
897 * The C++ Protobuf implementation makes heavy use of STL in its interface and implementation. | |
898 The proliferation of template instantiations gives the Protobuf runtime library a large footprint, | |
899 and using STL in the interface can lead to weird ABI problems and slow compiles. Cap'n Proto | |
900 does not use any STL containers in its interface and makes sparing use in its implementation. | |
901 As a result, the Cap'n Proto runtime library is smaller, and code that uses it compiles quickly. | |
902 | |
903 * The in-memory representation of messages in Protobuf-C++ involves many heap objects. Each | |
904 message (struct) is an object, each non-primitive repeated field allocates an array of pointers | |
905 to more objects, and each string may actually add two heap objects. Cap'n Proto by its nature | |
906 uses arena allocation, so the entire message is allocated in a few contiguous segments. This | |
907 means Cap'n Proto spends very little time allocating memory, stores messages more compactly, and | |
908 avoids memory fragmentation. | |
909 | |
910 * Related to the last point, Protobuf-C++ relies heavily on object reuse for performance. | |
911 Building or parsing into a newly-allocated Protobuf object is significantly slower than using | |
912 an existing one. However, the memory usage of a Protobuf object will tend to grow the more times | |
913 it is reused, particularly if it is used to parse messages of many different "shapes", so the | |
914 objects need to be deleted and re-allocated from time to time. All this makes tuning Protobufs | |
915 fairly tedious. In contrast, enabling memory reuse with Cap'n Proto is as simple as providing | |
916 a byte buffer to use as scratch space when you build or read in a message. Provide enough scratch | |
917 space to hold the entire message and Cap'n Proto won't allocate any memory. Or don't -- since | |
918 Cap'n Proto doesn't do much allocation in the first place, the benefits of scratch space are | |
919 small. |