cannam@62: --- cannam@62: layout: page cannam@62: title: C++ Serialization cannam@62: --- cannam@62: cannam@62: # C++ Serialization cannam@62: cannam@62: The Cap'n Proto C++ runtime implementation provides an easy-to-use interface for manipulating cannam@62: messages backed by fast pointer arithmetic. This page discusses the serialization layer of cannam@62: the runtime; see [C++ RPC](cxxrpc.html) for information about the RPC layer. cannam@62: cannam@62: ## Example Usage cannam@62: cannam@62: For the Cap'n Proto definition: cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Person { cannam@62: id @0 :UInt32; cannam@62: name @1 :Text; cannam@62: email @2 :Text; cannam@62: phones @3 :List(PhoneNumber); cannam@62: cannam@62: struct PhoneNumber { cannam@62: number @0 :Text; cannam@62: type @1 :Type; cannam@62: cannam@62: enum Type { cannam@62: mobile @0; cannam@62: home @1; cannam@62: work @2; cannam@62: } cannam@62: } cannam@62: cannam@62: employment :union { cannam@62: unemployed @4 :Void; cannam@62: employer @5 :Text; cannam@62: school @6 :Text; cannam@62: selfEmployed @7 :Void; cannam@62: # We assume that a person is only one of these. cannam@62: } cannam@62: } cannam@62: cannam@62: struct AddressBook { cannam@62: people @0 :List(Person); cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: You might write code like: cannam@62: cannam@62: {% highlight c++ %} cannam@62: #include "addressbook.capnp.h" cannam@62: #include cannam@62: #include cannam@62: #include cannam@62: cannam@62: void writeAddressBook(int fd) { cannam@62: ::capnp::MallocMessageBuilder message; cannam@62: cannam@62: AddressBook::Builder addressBook = message.initRoot(); cannam@62: ::capnp::List::Builder people = addressBook.initPeople(2); cannam@62: cannam@62: Person::Builder alice = people[0]; cannam@62: alice.setId(123); cannam@62: alice.setName("Alice"); cannam@62: alice.setEmail("alice@example.com"); cannam@62: // Type shown for explanation purposes; normally you'd use auto. cannam@62: ::capnp::List::Builder alicePhones = cannam@62: alice.initPhones(1); cannam@62: alicePhones[0].setNumber("555-1212"); cannam@62: alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE); cannam@62: alice.getEmployment().setSchool("MIT"); cannam@62: cannam@62: Person::Builder bob = people[1]; cannam@62: bob.setId(456); cannam@62: bob.setName("Bob"); cannam@62: bob.setEmail("bob@example.com"); cannam@62: auto bobPhones = bob.initPhones(2); cannam@62: bobPhones[0].setNumber("555-4567"); cannam@62: bobPhones[0].setType(Person::PhoneNumber::Type::HOME); cannam@62: bobPhones[1].setNumber("555-7654"); cannam@62: bobPhones[1].setType(Person::PhoneNumber::Type::WORK); cannam@62: bob.getEmployment().setUnemployed(); cannam@62: cannam@62: writePackedMessageToFd(fd, message); cannam@62: } cannam@62: cannam@62: void printAddressBook(int fd) { cannam@62: ::capnp::PackedFdMessageReader message(fd); cannam@62: cannam@62: AddressBook::Reader addressBook = message.getRoot(); cannam@62: cannam@62: for (Person::Reader person : addressBook.getPeople()) { cannam@62: std::cout << person.getName().cStr() << ": " cannam@62: << person.getEmail().cStr() << std::endl; cannam@62: for (Person::PhoneNumber::Reader phone: person.getPhones()) { cannam@62: const char* typeName = "UNKNOWN"; cannam@62: switch (phone.getType()) { cannam@62: case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break; cannam@62: case Person::PhoneNumber::Type::HOME: typeName = "home"; break; cannam@62: case Person::PhoneNumber::Type::WORK: typeName = "work"; break; cannam@62: } cannam@62: std::cout << " " << typeName << " phone: " cannam@62: << phone.getNumber().cStr() << std::endl; cannam@62: } cannam@62: Person::Employment::Reader employment = person.getEmployment(); cannam@62: switch (employment.which()) { cannam@62: case Person::Employment::UNEMPLOYED: cannam@62: std::cout << " unemployed" << std::endl; cannam@62: break; cannam@62: case Person::Employment::EMPLOYER: cannam@62: std::cout << " employer: " cannam@62: << employment.getEmployer().cStr() << std::endl; cannam@62: break; cannam@62: case Person::Employment::SCHOOL: cannam@62: std::cout << " student at: " cannam@62: << employment.getSchool().cStr() << std::endl; cannam@62: break; cannam@62: case Person::Employment::SELF_EMPLOYED: cannam@62: std::cout << " self-employed" << std::endl; cannam@62: break; cannam@62: } cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: ## C++ Feature Usage: C++11, Exceptions cannam@62: cannam@62: This implementation makes use of C++11 features. If you are using GCC, you will need at least cannam@62: version 4.7 to compile Cap'n Proto. If you are using Clang, you will need at least version 3.2. cannam@62: These compilers required the flag `-std=c++11` to enable C++11 features -- your code which cannam@62: `#include`s Cap'n Proto headers will need to be compiled with this flag. Other compilers have not cannam@62: been tested at this time. cannam@62: cannam@62: This implementation prefers to handle errors using exceptions. Exceptions are only used in cannam@62: circumstances that should never occur in normal operation. For example, exceptions are thrown cannam@62: on assertion failures (indicating bugs in the code), network failures, and invalid input. cannam@62: Exceptions thrown by Cap'n Proto are never part of the interface and never need to be caught in cannam@62: correct usage. The purpose of throwing exceptions is to allow higher-level code a chance to cannam@62: recover from unexpected circumstances without disrupting other work happening in the same process. cannam@62: For example, a server that handles requests from multiple clients should, on exception, return an cannam@62: error to the client that caused the exception and close that connection, but should continue cannam@62: handling other connections normally. cannam@62: cannam@62: When Cap'n Proto code might throw an exception from a destructor, it first checks cannam@62: `std::uncaught_exception()` to ensure that this is safe. If another exception is already active, cannam@62: the new exception is assumed to be a side-effect of the main exception, and is either silently cannam@62: swallowed or reported on a side channel. cannam@62: cannam@62: In recognition of the fact that some teams prefer not to use exceptions, and that even enabling cannam@62: exceptions in the compiler introduces overhead, Cap'n Proto allows you to disable them entirely cannam@62: by registering your own exception callback. The callback will be called in place of throwing an cannam@62: exception. The callback may abort the process, and is required to do so in certain circumstances cannam@62: (when a fatal bug is detected). If the callback returns normally, Cap'n Proto will attempt cannam@62: to continue by inventing "safe" values. This will lead to garbage output, but at least the program cannam@62: will not crash. Your exception callback should set some sort of a flag indicating that an error cannam@62: occurred, and somewhere up the stack you should check for that flag and cancel the operation. cannam@62: See the header `kj/exception.h` for details on how to register an exception callback. cannam@62: cannam@62: ## KJ Library cannam@62: cannam@62: Cap'n Proto is built on top of a basic utility library called KJ. The two were actually developed cannam@62: together -- KJ is simply the stuff which is not specific to Cap'n Proto serialization, and may be cannam@62: useful to others independently of Cap'n Proto. For now, the the two are distributed together. The cannam@62: name "KJ" has no particular meaning; it was chosen to be short and easy-to-type. cannam@62: cannam@62: As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library. You may need cannam@62: to explicitly link against libraries: `-lcapnp -lkj` cannam@62: cannam@62: ## Generating Code cannam@62: cannam@62: To generate C++ code from your `.capnp` [interface definition](language.html), run: cannam@62: cannam@62: capnp compile -oc++ myproto.capnp cannam@62: cannam@62: This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`. cannam@62: cannam@62: To use this code in your app, you must link against both `libcapnp` and `libkj`. If you use cannam@62: `pkg-config`, Cap'n Proto provides the `capnp` module to simplify discovery of compiler and linker cannam@62: flags. cannam@62: cannam@62: If you use [RPC](cxxrpc.html) (i.e., your schema defines [interfaces](language.html#interfaces)), cannam@62: then you will additionally nead to link against `libcapnp-rpc` and `libkj-async`, or use the cannam@62: `capnp-rpc` `pkg-config` module. cannam@62: cannam@62: ### Setting a Namespace cannam@62: cannam@62: You probably want your generated types to live in a C++ namespace. You will need to import cannam@62: `/capnp/c++.capnp` and use the `namespace` annotation it defines: cannam@62: cannam@62: {% highlight capnp %} cannam@62: using Cxx = import "/capnp/c++.capnp"; cannam@62: $Cxx.namespace("foo::bar::baz"); cannam@62: {% endhighlight %} cannam@62: cannam@62: Note that `capnp/c++.capnp` is installed in `$PREFIX/include` (`/usr/local/include` by default) cannam@62: when you install the C++ runtime. The `capnp` tool automatically searches `/usr/include` and cannam@62: `/usr/local/include` for imports that start with a `/`, so it should "just work". If you installed cannam@62: somewhere else, you may need to add it to the search path with the `-I` flag to `capnp compile`, cannam@62: which works much like the compiler flag of the same name. cannam@62: cannam@62: ## Types cannam@62: cannam@62: ### Primitive Types cannam@62: cannam@62: Primitive types map to the obvious C++ types: cannam@62: cannam@62: * `Bool` -> `bool` cannam@62: * `IntNN` -> `intNN_t` cannam@62: * `UIntNN` -> `uintNN_t` cannam@62: * `Float32` -> `float` cannam@62: * `Float64` -> `double` cannam@62: * `Void` -> `::capnp::Void` (An empty struct; its only value is `::capnp::VOID`) cannam@62: cannam@62: ### Structs cannam@62: cannam@62: For each struct `Foo` in your interface, a C++ type named `Foo` generated. This type itself is cannam@62: really just a namespace; it contains two important inner classes: `Reader` and `Builder`. cannam@62: cannam@62: `Reader` represents a read-only instance of `Foo` while `Builder` represents a writable instance cannam@62: (usually, one that you are building). Both classes behave like pointers, in that you can pass them cannam@62: by value and they do not own the underlying data that they operate on. In other words, cannam@62: `Foo::Builder` is like a pointer to a `Foo` while `Foo::Reader` is like a const pointer to a `Foo`. cannam@62: cannam@62: For every field `bar` defined in `Foo`, `Foo::Reader` has a method `getBar()`. For primitive types, cannam@62: `get` just returns the type, but for structs, lists, and blobs, it returns a `Reader` for the cannam@62: type. cannam@62: cannam@62: {% highlight c++ %} cannam@62: // Example Reader methods: cannam@62: cannam@62: // myPrimitiveField @0 :Int32; cannam@62: int32_t getMyPrimitiveField(); cannam@62: cannam@62: // myTextField @1 :Text; cannam@62: ::capnp::Text::Reader getMyTextField(); cannam@62: // (Note that Text::Reader may be implicitly cast to const char* and cannam@62: // std::string.) cannam@62: cannam@62: // myStructField @2 :MyStruct; cannam@62: MyStruct::Reader getMyStructField(); cannam@62: cannam@62: // myListField @3 :List(Float64); cannam@62: ::capnp::List getMyListField(); cannam@62: {% endhighlight %} cannam@62: cannam@62: `Foo::Builder`, meanwhile, has several methods for each field `bar`: cannam@62: cannam@62: * `getBar()`: For primitives, returns the value. For composites, returns a Builder for the cannam@62: composite. If a composite field has not been initialized (i.e. this is the first time it has cannam@62: been accessed), it will be initialized to a copy of the field's default value before returning. cannam@62: * `setBar(x)`: For primitives, sets the value to x. For composites, sets the value to a deep copy cannam@62: of x, which must be a Reader for the type. cannam@62: * `initBar(n)`: Only for lists and blobs. Sets the field to a newly-allocated list or blob cannam@62: of size n and returns a Builder for it. The elements of the list are initialized to their empty cannam@62: state (zero for numbers, default values for structs). cannam@62: * `initBar()`: Only for structs. Sets the field to a newly-allocated struct and returns a cannam@62: Builder for it. Note that the newly-allocated struct is initialized to the default value for cannam@62: the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it cannam@62: has one). cannam@62: * `hasBar()`: Only for pointer fields (e.g. structs, lists, blobs). Returns true if the pointer cannam@62: has been initialized (non-null). (This method is also available on readers.) cannam@62: * `adoptBar(x)`: Only for pointer fields. Adopts the orphaned object x, linking it into the field cannam@62: `bar` without copying. See the section on orphans. cannam@62: * `disownBar()`: Disowns the value pointed to by `bar`, setting the pointer to null and returning cannam@62: its previous value as an orphan. See the section on orphans. cannam@62: cannam@62: {% highlight c++ %} cannam@62: // Example Builder methods: cannam@62: cannam@62: // myPrimitiveField @0 :Int32; cannam@62: int32_t getMyPrimitiveField(); cannam@62: void setMyPrimitiveField(int32_t value); cannam@62: cannam@62: // myTextField @1 :Text; cannam@62: ::capnp::Text::Builder getMyTextField(); cannam@62: void setMyTextField(::capnp::Text::Reader value); cannam@62: ::capnp::Text::Builder initMyTextField(size_t size); cannam@62: // (Note that Text::Reader is implicitly constructable from const char* cannam@62: // and std::string, and Text::Builder can be implicitly cast to cannam@62: // these types.) cannam@62: cannam@62: // myStructField @2 :MyStruct; cannam@62: MyStruct::Builder getMyStructField(); cannam@62: void setMyStructField(MyStruct::Reader value); cannam@62: MyStruct::Builder initMyStructField(); cannam@62: cannam@62: // myListField @3 :List(Float64); cannam@62: ::capnp::List::Builder getMyListField(); cannam@62: void setMyListField(::capnp::List::Reader value); cannam@62: ::capnp::List::Builder initMyListField(size_t size); cannam@62: {% endhighlight %} cannam@62: cannam@62: ### Groups cannam@62: cannam@62: Groups look a lot like a combination of a nested type and a field of that type, except that you cannam@62: cannot set, adopt, or disown a group -- you can only get and init it. cannam@62: cannam@62: ### Unions cannam@62: cannam@62: A named union (as opposed to an unnamed one) works just like a group, except with some additions: cannam@62: cannam@62: * For each field `foo`, the union reader and builder have a method `isFoo()` which returns true cannam@62: if `foo` is the currently-set field in the union. cannam@62: * The union reader and builder also have a method `which()` that returns an enum value indicating cannam@62: which field is currently set. cannam@62: * Calling the set, init, or adopt accessors for a field makes it the currently-set field. cannam@62: * Calling the get or disown accessors on a field that isn't currently set will throw an cannam@62: exception in debug mode or return garbage when `NDEBUG` is defined. cannam@62: cannam@62: Unnamed unions differ from named unions only in that the accessor methods from the union's members cannam@62: are added directly to the containing type's reader and builder, rather than generating a nested cannam@62: type. cannam@62: cannam@62: See the [example](#example-usage) at the top of the page for an example of unions. cannam@62: cannam@62: ### Lists cannam@62: cannam@62: Lists are represented by the type `capnp::List`, where `T` is any of the primitive types, cannam@62: any Cap'n Proto user-defined type, `capnp::Text`, `capnp::Data`, or `capnp::List` cannam@62: (to form a list of lists). cannam@62: cannam@62: The type `List` itself is not instantiatable, but has two inner classes: `Reader` and `Builder`. cannam@62: As with structs, these types behave like pointers to read-only and read-write data, respectively. cannam@62: cannam@62: Both `Reader` and `Builder` implement `size()`, `operator[]`, `begin()`, and `end()`, as good C++ cannam@62: containers should. Note, though, that `operator[]` is read-only -- you cannot use it to assign cannam@62: the element, because that would require returning a reference, which is impossible because the cannam@62: underlying data may not be in your CPU's native format (e.g., wrong byte order). Instead, to cannam@62: assign an element of a list, you must use `builder.set(index, value)`. cannam@62: cannam@62: For `List` where `Foo` is a non-primitive type, the type returned by `operator[]` and cannam@62: `iterator::operator*()` is `Foo::Reader` (for `List::Reader`) or `Foo::Builder` cannam@62: (for `List::Builder`). The builder's `set` method takes a `Foo::Reader` as its second cannam@62: parameter. cannam@62: cannam@62: For lists of lists or lists of blobs, the builder also has a method `init(index, size)` which sets cannam@62: the element at the given index to a newly-allocated value with the given size and returns a builder cannam@62: for it. Struct lists do not have an `init` method because all elements are initialized to empty cannam@62: values when the list is created. cannam@62: cannam@62: ### Enums cannam@62: cannam@62: Cap'n Proto enums become C++11 "enum classes". That means they behave like any other enum, but cannam@62: the enum's values are scoped within the type. E.g. for an enum `Foo` with value `bar`, you must cannam@62: refer to the value as `Foo::BAR`. cannam@62: cannam@62: To match prevaling C++ style, an enum's value names are converted to UPPERCASE_WITH_UNDERSCORES cannam@62: (whereas in the schema language you'd write them in camelCase). cannam@62: cannam@62: Keep in mind when writing `switch` blocks that an enum read off the wire may have a numeric cannam@62: value that is not listed in its definition. This may be the case if the sender is using a newer cannam@62: version of the protocol, or if the message is corrupt or malicious. In C++11, enums are allowed cannam@62: to have any value that is within the range of their base type, which for Cap'n Proto enums is cannam@62: `uint16_t`. cannam@62: cannam@62: ### Blobs (Text and Data) cannam@62: cannam@62: Blobs are manipulated using the classes `capnp::Text` and `capnp::Data`. These classes are, cannam@62: again, just containers for inner classes `Reader` and `Builder`. These classes are iterable and cannam@62: implement `size()` and `operator[]` methods. `Builder::operator[]` even returns a reference cannam@62: (unlike with `List`). `Text::Reader` additionally has a method `cStr()` which returns a cannam@62: NUL-terminated `const char*`. cannam@62: cannam@62: As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying cannam@62: type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format. This is cannam@62: accomplished without actually `#include`ing ``, since some clients do not want to rely cannam@62: on this rather-bulky header. In fact, any class which defines a `.c_str()` method will be cannam@62: implicitly convertible in this way. Unfortunately, this trick doesn't work on GCC 4.7. cannam@62: cannam@62: ### Interfaces cannam@62: cannam@62: [Interfaces (RPC) have their own page.](cxxrpc.html) cannam@62: cannam@62: ### Generics cannam@62: cannam@62: [Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose cannam@62: name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types cannam@62: are not, because they inherit the parameters from the outer type. Similarly, template parameters cannam@62: should refer to outer types, not `Reader` or `Builder` types. cannam@62: cannam@62: For example, given: cannam@62: cannam@62: {% highlight capnp %} cannam@62: struct Map(Key, Value) { cannam@62: entries @0 :List(Entry); cannam@62: struct Entry { cannam@62: key @0 :Key; cannam@62: value @1 :Value; cannam@62: } cannam@62: } cannam@62: cannam@62: struct People { cannam@62: byName @0 :Map(Text, Person); cannam@62: # Maps names to Person instances. cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: You might write code like: cannam@62: cannam@62: {% highlight c++ %} cannam@62: void processPeople(People::Reader people) { cannam@62: Map::Reader reader = people.getByName(); cannam@62: capnp::List::Entry>::Reader entries = cannam@62: reader.getEntries() cannam@62: for (auto entry: entries) { cannam@62: processPerson(entry); cannam@62: } cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Note that all template parameters will be specified with a default value of `AnyPointer`. cannam@62: Therefore, the type `Map<>` is equivalent to `Map`. cannam@62: cannam@62: ### Constants cannam@62: cannam@62: Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style cannam@62: (whereas in the schema language you’d write them in camelCase). Primitive constants are just cannam@62: `constexpr` values. Pointer-type constants (e.g. structs, lists, and blobs) are represented cannam@62: using a proxy object that can be converted to the relevant `Reader` type, either implicitly or cannam@62: using the unary `*` or `->` operators. cannam@62: cannam@62: ## Messages and I/O cannam@62: cannam@62: To create a new message, you must start by creating a `capnp::MessageBuilder` cannam@62: (`capnp/message.h`). This is an abstract type which you can implement yourself, but most users cannam@62: will want to use `capnp::MallocMessageBuilder`. Once your message is constructed, write it to cannam@62: a file descriptor with `capnp::writeMessageToFd(fd, builder)` (`capnp/serialize.h`) or cannam@62: `capnp::writePackedMessageToFd(fd, builder)` (`capnp/serialize-packed.h`). cannam@62: cannam@62: To read a message, you must create a `capnp::MessageReader`, which is another abstract type. cannam@62: Implementations are specific to the data source. You can use `capnp::StreamFdMessageReader` cannam@62: (`capnp/serialize.h`) or `capnp::PackedFdMessageReader` (`capnp/serialize-packed.h`) cannam@62: to read from file descriptors; both take the file descriptor as a constructor argument. cannam@62: cannam@62: Note that if your stream contains additional data after the message, `PackedFdMessageReader` may cannam@62: accidentally read some of that data, since it does buffered I/O. To make this work correctly, you cannam@62: will need to set up a multi-use buffered stream. Buffered I/O may also be a good idea with cannam@62: `StreamFdMessageReader` and also when writing, for performance reasons. See `capnp/io.h` for cannam@62: details. cannam@62: cannam@62: There is an [example](#example-usage) of all this at the beginning of this page. cannam@62: cannam@62: ### Using mmap cannam@62: cannam@62: Cap'n Proto can be used together with `mmap()` (or Win32's `MapViewOfFile()`) for extremely fast cannam@62: reads, especially when you only need to use a subset of the data in the file. Currently, cannam@62: Cap'n Proto is not well-suited for _writing_ via `mmap()`, only reading, but this is only because cannam@62: we have not yet invented a mutable segment framing format -- the underlying design should cannam@62: eventually work for both. cannam@62: cannam@62: To take advantage of `mmap()` at read time, write your file in regular serialized (but NOT packed) cannam@62: format -- that is, use `writeMessageToFd()`, _not_ `writePackedMessageToFd()`. Now, `mmap()` in cannam@62: the entire file, and then pass the mapped memory to the constructor of cannam@62: `capnp::FlatArrayMessageReader` (defined in `capnp/serialize.h`). That's it. You can use the cannam@62: reader just like a normal `StreamFdMessageReader`. The operating system will automatically page cannam@62: in data from disk as you read it. cannam@62: cannam@62: `mmap()` works best when reading from flash media, or when the file is already hot in cache. cannam@62: It works less well with slow rotating disks. Here, disk seeks make random access relatively cannam@62: expensive. Also, if I/O throughput is your bottleneck, then the fact that mmaped data cannot cannam@62: be packed or compressed may hurt you. However, it all depends on what fraction of the file you're cannam@62: actually reading -- if you only pull one field out of one deeply-nested struct in a huge tree, it cannam@62: may still be a win. The only way to know for sure is to do benchmarks! (But be careful to make cannam@62: sure your benchmark is actually interacting with disk and not cache.) cannam@62: cannam@62: ## Dynamic Reflection cannam@62: cannam@62: Sometimes you want to write generic code that operates on arbitrary types, iterating over the cannam@62: fields or looking them up by name. For example, you might want to write code that encodes cannam@62: arbitrary Cap'n Proto types in JSON format. This requires something like "reflection", but C++ cannam@62: does not offer reflection. Also, you might even want to operate on types that aren't compiled cannam@62: into the binary at all, but only discovered at runtime. cannam@62: cannam@62: The C++ API supports inspecting schemas at runtime via the interface defined in cannam@62: `capnp/schema.h`, and dynamically reading and writing instances of arbitrary types via cannam@62: `capnp/dynamic.h`. Here's the example from the beginning of this file rewritten in terms cannam@62: of the dynamic API: cannam@62: cannam@62: {% highlight c++ %} cannam@62: #include "addressbook.capnp.h" cannam@62: #include cannam@62: #include cannam@62: #include cannam@62: #include cannam@62: #include cannam@62: cannam@62: using ::capnp::DynamicValue; cannam@62: using ::capnp::DynamicStruct; cannam@62: using ::capnp::DynamicEnum; cannam@62: using ::capnp::DynamicList; cannam@62: using ::capnp::List; cannam@62: using ::capnp::Schema; cannam@62: using ::capnp::StructSchema; cannam@62: using ::capnp::EnumSchema; cannam@62: cannam@62: using ::capnp::Void; cannam@62: using ::capnp::Text; cannam@62: using ::capnp::MallocMessageBuilder; cannam@62: using ::capnp::PackedFdMessageReader; cannam@62: cannam@62: void dynamicWriteAddressBook(int fd, StructSchema schema) { cannam@62: // Write a message using the dynamic API to set each cannam@62: // field by text name. This isn't something you'd cannam@62: // normally want to do; it's just for illustration. cannam@62: cannam@62: MallocMessageBuilder message; cannam@62: cannam@62: // Types shown for explanation purposes; normally you'd cannam@62: // use auto. cannam@62: DynamicStruct::Builder addressBook = cannam@62: message.initRoot(schema); cannam@62: cannam@62: DynamicList::Builder people = cannam@62: addressBook.init("people", 2).as(); cannam@62: cannam@62: DynamicStruct::Builder alice = cannam@62: people[0].as(); cannam@62: alice.set("id", 123); cannam@62: alice.set("name", "Alice"); cannam@62: alice.set("email", "alice@example.com"); cannam@62: auto alicePhones = alice.init("phones", 1).as(); cannam@62: auto phone0 = alicePhones[0].as(); cannam@62: phone0.set("number", "555-1212"); cannam@62: phone0.set("type", "mobile"); cannam@62: alice.get("employment").as() cannam@62: .set("school", "MIT"); cannam@62: cannam@62: auto bob = people[1].as(); cannam@62: bob.set("id", 456); cannam@62: bob.set("name", "Bob"); cannam@62: bob.set("email", "bob@example.com"); cannam@62: cannam@62: // Some magic: We can convert a dynamic sub-value back to cannam@62: // the native type with as()! cannam@62: List::Builder bobPhones = cannam@62: bob.init("phones", 2).as>(); cannam@62: bobPhones[0].setNumber("555-4567"); cannam@62: bobPhones[0].setType(Person::PhoneNumber::Type::HOME); cannam@62: bobPhones[1].setNumber("555-7654"); cannam@62: bobPhones[1].setType(Person::PhoneNumber::Type::WORK); cannam@62: bob.get("employment").as() cannam@62: .set("unemployed", ::capnp::VOID); cannam@62: cannam@62: writePackedMessageToFd(fd, message); cannam@62: } cannam@62: cannam@62: void dynamicPrintValue(DynamicValue::Reader value) { cannam@62: // Print an arbitrary message via the dynamic API by cannam@62: // iterating over the schema. Look at the handling cannam@62: // of STRUCT in particular. cannam@62: cannam@62: switch (value.getType()) { cannam@62: case DynamicValue::VOID: cannam@62: std::cout << ""; cannam@62: break; cannam@62: case DynamicValue::BOOL: cannam@62: std::cout << (value.as() ? "true" : "false"); cannam@62: break; cannam@62: case DynamicValue::INT: cannam@62: std::cout << value.as(); cannam@62: break; cannam@62: case DynamicValue::UINT: cannam@62: std::cout << value.as(); cannam@62: break; cannam@62: case DynamicValue::FLOAT: cannam@62: std::cout << value.as(); cannam@62: break; cannam@62: case DynamicValue::TEXT: cannam@62: std::cout << '\"' << value.as().cStr() << '\"'; cannam@62: break; cannam@62: case DynamicValue::LIST: { cannam@62: std::cout << "["; cannam@62: bool first = true; cannam@62: for (auto element: value.as()) { cannam@62: if (first) { cannam@62: first = false; cannam@62: } else { cannam@62: std::cout << ", "; cannam@62: } cannam@62: dynamicPrintValue(element); cannam@62: } cannam@62: std::cout << "]"; cannam@62: break; cannam@62: } cannam@62: case DynamicValue::ENUM: { cannam@62: auto enumValue = value.as(); cannam@62: KJ_IF_MAYBE(enumerant, enumValue.getEnumerant()) { cannam@62: std::cout << cannam@62: enumerant->getProto().getName().cStr(); cannam@62: } else { cannam@62: // Unknown enum value; output raw number. cannam@62: std::cout << enumValue.getRaw(); cannam@62: } cannam@62: break; cannam@62: } cannam@62: case DynamicValue::STRUCT: { cannam@62: std::cout << "("; cannam@62: auto structValue = value.as(); cannam@62: bool first = true; cannam@62: for (auto field: structValue.getSchema().getFields()) { cannam@62: if (!structValue.has(field)) continue; cannam@62: if (first) { cannam@62: first = false; cannam@62: } else { cannam@62: std::cout << ", "; cannam@62: } cannam@62: std::cout << field.getProto().getName().cStr() cannam@62: << " = "; cannam@62: dynamicPrintValue(structValue.get(field)); cannam@62: } cannam@62: std::cout << ")"; cannam@62: break; cannam@62: } cannam@62: default: cannam@62: // There are other types, we aren't handling them. cannam@62: std::cout << "?"; cannam@62: break; cannam@62: } cannam@62: } cannam@62: cannam@62: void dynamicPrintMessage(int fd, StructSchema schema) { cannam@62: PackedFdMessageReader message(fd); cannam@62: dynamicPrintValue(message.getRoot(schema)); cannam@62: std::cout << std::endl; cannam@62: } cannam@62: {% endhighlight %} cannam@62: cannam@62: Notes about the dynamic API: cannam@62: cannam@62: * You can implicitly cast any compiled Cap'n Proto struct reader/builder type directly to cannam@62: `DynamicStruct::Reader`/`DynamicStruct::Builder`. Similarly with `List` and `DynamicList`, cannam@62: and even enum types and `DynamicEnum`. Finally, all valid Cap'n Proto field types may be cannam@62: implicitly converted to `DynamicValue`. cannam@62: cannam@62: * You can load schemas dynamically at runtime using `SchemaLoader` (`capnp/schema-loader.h`) and cannam@62: use the Dynamic API to manipulate objects of these types. `MessageBuilder` and `MessageReader` cannam@62: have methods for accessing the message root using a dynamic schema. cannam@62: cannam@62: * While `SchemaLoader` loads binary schemas, you can also parse directly from text using cannam@62: `SchemaParser` (`capnp/schema-parser.h`). However, this requires linking against `libcapnpc` cannam@62: (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient. If cannam@62: you can arrange to use only binary schemas at runtime, you'll be better off. cannam@62: cannam@62: * Unlike with Protobufs, there is no "global registry" of compiled-in types. To get the schema cannam@62: for a compiled-in type, use `capnp::Schema::from()`. cannam@62: cannam@62: * Unlike with Protobufs, the overhead of supporting reflection is small. Generated `.capnp.c++` cannam@62: files contain only some embedded const data structures describing the schema, no code at all, cannam@62: and the runtime library support code is relatively small. Moreover, if you do not use the cannam@62: dynamic API or the schema API, you do not even need to link their implementations into your cannam@62: executable. cannam@62: cannam@62: * The dynamic API performs type checks at runtime. In case of error, it will throw an exception. cannam@62: If you compile with `-fno-exceptions`, it will crash instead. Correct usage of the API should cannam@62: never throw, but bugs happen. Enabling and catching exceptions will make your code more robust. cannam@62: cannam@62: * Loading user-provided schemas has security implications: it greatly increases the attack cannam@62: surface of the Cap'n Proto library. In particular, it is easy for an attacker to trigger cannam@62: exceptions. To protect yourself, you are strongly advised to enable exceptions and catch them. cannam@62: cannam@62: ## Orphans cannam@62: cannam@62: An "orphan" is a Cap'n Proto object that is disconnected from the message structure. That is, cannam@62: it is not the root of a message, and there is no other Cap'n Proto object holding a pointer to it. cannam@62: Thus, it has no parents. Orphans are an advanced feature that can help avoid copies and make it cannam@62: easier to use Cap'n Proto objects as part of your application's internal state. Typical cannam@62: applications probably won't use orphans. cannam@62: cannam@62: The class `capnp::Orphan` (defined in ``) represents a pointer to an orphaned cannam@62: object of type `T`. `T` can be any struct type, `List`, `Text`, or `Data`. E.g. cannam@62: `capnp::Orphan` would be an orphaned `Person` structure. `Orphan` is a move-only class, cannam@62: similar to `std::unique_ptr`. This prevents two different objects from adopting the same cannam@62: orphan, which would result in an invalid message. cannam@62: cannam@62: An orphan can be "adopted" by another object to link it into the message structure. Conversely, cannam@62: an object can "disown" one of its pointers, causing the pointed-to object to become an orphan. cannam@62: Every pointer-typed field `foo` provides builder methods `adoptFoo()` and `disownFoo()` for these cannam@62: purposes. Again, these methods use C++11 move semantics. To use them, you will need to be cannam@62: familiar with `std::move()` (or the equivalent but shorter-named `kj::mv()`). cannam@62: cannam@62: Even though an orphan is unlinked from the message tree, it still resides inside memory allocated cannam@62: for a particular message (i.e. a particular `MessageBuilder`). An orphan can only be adopted by cannam@62: objects that live in the same message. To move objects between messages, you must perform a copy. cannam@62: If the message is serialized while an `Orphan` living within it still exists, the orphan's cannam@62: content will be part of the serialized message, but the only way the receiver could find it is by cannam@62: investigating the raw message; the Cap'n Proto API provides no way to detect or read it. cannam@62: cannam@62: To construct an orphan from scratch (without having some other object disown it), you need an cannam@62: `Orphanage`, which is essentially an orphan factory associated with some message. You can get one cannam@62: by calling the `MessageBuilder`'s `getOrphanage()` method, or by calling the static method cannam@62: `Orphanage::getForMessageContaining(builder)` and passing it any struct or list builder. cannam@62: cannam@62: Note that when an `Orphan` goes out-of-scope without being adopted, the underlying memory that cannam@62: it occupied is overwritten with zeros. If you use packed serialization, these zeros will take very cannam@62: little bandwidth on the wire, but will still waste memory on the sending and receiving ends. cannam@62: Generally, you should avoid allocating message objects that won't be used, or if you cannot avoid cannam@62: it, arrange to copy the entire message over to a new `MessageBuilder` before serializing, since cannam@62: only the reachable objects will be copied. cannam@62: cannam@62: ## Reference cannam@62: cannam@62: The runtime library contains lots of useful features not described on this page. For now, the cannam@62: best reference is the header files. See: cannam@62: cannam@62: capnp/list.h cannam@62: capnp/blob.h cannam@62: capnp/message.h cannam@62: capnp/serialize.h cannam@62: capnp/serialize-packed.h cannam@62: capnp/schema.h cannam@62: capnp/schema-loader.h cannam@62: capnp/dynamic.h cannam@62: cannam@62: ## Tips and Best Practices cannam@62: cannam@62: Here are some tips for using the C++ Cap'n Proto runtime most effectively: cannam@62: cannam@62: * Accessor methods for primitive (non-pointer) fields are fast and inline. They should be just cannam@62: as fast as accessing a struct field through a pointer. cannam@62: cannam@62: * Accessor methods for pointer fields, on the other hand, are not inline, as they need to validate cannam@62: the pointer. If you intend to access the same pointer multiple times, it is a good idea to cannam@62: save the value to a local variable to avoid repeating this work. This is generally not a cannam@62: problem given C++11's `auto`. cannam@62: cannam@62: Example: cannam@62: cannam@62: // BAD cannam@62: frob(foo.getBar().getBaz(), cannam@62: foo.getBar().getQux(), cannam@62: foo.getBar().getCorge()); cannam@62: cannam@62: // GOOD cannam@62: auto bar = foo.getBar(); cannam@62: frob(bar.getBaz(), bar.getQux(), bar.getCorge()); cannam@62: cannam@62: It is especially important to use this style when reading messages, for another reason: as cannam@62: described under the "security tips" section, below, every time you `get` a pointer, Cap'n Proto cannam@62: increments a counter by the size of the target object. If that counter hits a pre-defined limit, cannam@62: an exception is thrown (or a default value is returned, if exceptions are disabled), to prevent cannam@62: a malicious client from sending your server into an infinite loop with a specially-crafted cannam@62: message. If you repeatedly `get` the same object, you are repeatedly counting the same bytes, cannam@62: and so you may hit the limit prematurely. (Since Cap'n Proto readers are backed directly by cannam@62: the underlying message buffer and do not have anywhere else to store per-object information, it cannam@62: is impossible to remember whether you've seen a particular object already.) cannam@62: cannam@62: * Internally, all pointer fields start out "null", even if they have default values. When you have cannam@62: a pointer field `foo` and you call `getFoo()` on the containing struct's `Reader`, if the field cannam@62: is "null", you will receive a reader for that field's default value. This reader is backed by cannam@62: read-only memory; nothing is allocated. However, when you call `get` on a _builder_, and the cannam@62: field is null, then the implementation must make a _copy_ of the default value to return to you. cannam@62: Thus, you've caused the field to become non-null, just by "reading" it. On the other hand, if cannam@62: you call `init` on that field, you are explicitly replacing whatever value is already there cannam@62: (null or not) with a newly-allocated instance, and that newly-allocated instance is _not_ a cannam@62: copy of the field's default value, but just a completely-uninitialized instance of the cannam@62: appropriate type. cannam@62: cannam@62: * It is possible to receive a struct value constructed from a newer version of the protocol than cannam@62: the one your binary was built with, and that struct might have extra fields that you don't know cannam@62: about. The Cap'n Proto implementation tries to avoid discarding this extra data. If you copy cannam@62: the struct from one message to another (e.g. by calling a set() method on a parent object), the cannam@62: extra fields will be preserved. This makes it possible to build proxies that receive messages cannam@62: and forward them on without having to rebuild the proxy every time a new field is added. You cannam@62: must be careful, however: in some cases, it's not possible to retain the extra fields, because cannam@62: they need to be copied into a space that is allocated before the expected content is known. cannam@62: In particular, lists of structs are represented as a flat array, not as an array of pointers. cannam@62: Therefore, all memory for all structs in the list must be allocated upfront. Hence, copying cannam@62: a struct value from another message into an element of a list will truncate the value. Because cannam@62: of this, the setter method for struct lists is called `setWithCaveats()` rather than just `set()`. cannam@62: cannam@62: * Messages are built in "arena" or "region" style: each object is allocated sequentially in cannam@62: memory, until there is no more room in the segment, in which case a new segment is allocated, cannam@62: and objects continue to be allocated sequentially in that segment. This design is what makes cannam@62: Cap'n Proto possible at all, and it is very fast compared to other allocation strategies. cannam@62: However, it has the disadvantage that if you allocate an object and then discard it, that memory cannam@62: is lost. In fact, the empty space will still become part of the serialized message, even though cannam@62: it is unreachable. The implementation will try to zero it out, so at least it should pack well, cannam@62: but it's still better to avoid this situation. Some ways that this can happen include: cannam@62: * If you `init` a field that is already initialized, the previous value is discarded. cannam@62: * If you create an orphan that is never adopted into the message tree. cannam@62: * If you use `adoptWithCaveats` to adopt an orphaned struct into a struct list, then a shallow cannam@62: copy is necessary, since the struct list requires that its elements are sequential in memory. cannam@62: The previous copy of the struct is discarded (although child objects are transferred properly). cannam@62: * If you copy a struct value from another message using a `set` method, the copy will have the cannam@62: same size as the original. However, the original could have been built with an older version cannam@62: of the protocol which lacked some fields compared to the version your program was built with. cannam@62: If you subsequently `get` that struct, the implementation will be forced to allocate a new cannam@62: (shallow) copy which is large enough to hold all known fields, and the old copy will be cannam@62: discarded. Child objects will be transferred over without being copied -- though they might cannam@62: suffer from the same problem if you `get` them later on. cannam@62: Sometimes, avoiding these problems is too inconvenient. Fortunately, it's also possible to cannam@62: clean up the mess after-the-fact: if you copy the whole message tree into a fresh cannam@62: `MessageBuilder`, only the reachable objects will be copied, leaving out all of the unreachable cannam@62: dead space. cannam@62: cannam@62: In the future, Cap'n Proto may be improved such that it can re-use dead space in a message. cannam@62: However, this will only improve things, not fix them entirely: fragementation could still leave cannam@62: dead space. cannam@62: cannam@62: ### Build Tips cannam@62: cannam@62: * If you are worried about the binary footprint of the Cap'n Proto library, consider statically cannam@62: linking with the `--gc-sections` linker flag. This will allow the linker to drop pieces of the cannam@62: library that you do not actually use. For example, many users do not use the dynamic schema and cannam@62: reflection APIs, which contribute a large fraction of the Cap'n Proto library's overall cannam@62: footprint. Keep in mind that if you ever stringify a Cap'n Proto type, the stringification code cannam@62: depends on the dynamic API; consider only using stringification in debug builds. cannam@62: cannam@62: If you are dynamically linking against the system's shared copy of `libcapnp`, don't worry about cannam@62: its binary size. Remember that only the code which you actually use will be paged into RAM, and cannam@62: those pages are shared with other applications on the system. cannam@62: cannam@62: Also remember to strip your binary. In particular, `libcapnpc` (the schema parser) has cannam@62: excessively large symbol names caused by its use of template-based parser combinators. Stripping cannam@62: the binary greatly reduces its size. cannam@62: cannam@62: * The Cap'n Proto library has lots of debug-only asserts that are removed if you `#define NDEBUG`, cannam@62: including in headers. If you care at all about performance, you should compile your production cannam@62: binaries with the `-DNDEBUG` compiler flag. In fact, if Cap'n Proto detects that you have cannam@62: optimization enabled but have not defined `NDEBUG`, it will define it for you (with a warning), cannam@62: unless you define `DEBUG` or `KJ_DEBUG` to explicitly request debugging. cannam@62: cannam@62: ### Security Tips cannam@62: cannam@62: Cap'n Proto has not yet undergone security review. It most likely has some vulnerabilities. You cannam@62: should not attempt to decode Cap'n Proto messages from sources you don't trust at this time. cannam@62: cannam@62: However, assuming the Cap'n Proto implementation hardens up eventually, then the following security cannam@62: tips will apply. cannam@62: cannam@62: * It is highly recommended that you enable exceptions. When compiled with `-fno-exceptions`, cannam@62: Cap'n Proto categorizes exceptions into "fatal" and "recoverable" varieties. Fatal exceptions cannam@62: cause the server to crash, while recoverable exceptions are handled by logging an error and cannam@62: returning a "safe" garbage value. Fatal is preferred in cases where it's unclear what kind of cannam@62: garbage value would constitute "safe". The more of the library you use, the higher the chance cannam@62: that you will leave yourself open to the possibility that an attacker could trigger a fatal cannam@62: exception somewhere. If you enable exceptions, then you can catch the exception instead of cannam@62: crashing, and return an error just to the attacker rather than to everyone using your server. cannam@62: cannam@62: Basic parsing of Cap'n Proto messages shouldn't ever trigger fatal exceptions (assuming the cannam@62: implementation is not buggy). However, the dynamic API -- especially if you are loading schemas cannam@62: controlled by the attacker -- is much more exception-happy. If you cannot use exceptions, then cannam@62: you are advised to avoid the dynamic API when dealing with untrusted data. cannam@62: cannam@62: * If you need to process schemas from untrusted sources, take them in binary format, not text. cannam@62: The text parser is a much larger attack surface and not designed to be secure. For instance, cannam@62: as of this writing, it is trivial to deadlock the parser by simply writing a constant whose value cannam@62: depends on itself. cannam@62: cannam@62: * Cap'n Proto automatically applies two artificial limits on messages for security reasons: cannam@62: a limit on nesting dept, and a limit on total bytes traversed. cannam@62: cannam@62: * The nesting depth limit is designed to prevent stack overflow when handling a deeply-nested cannam@62: recursive type, and defaults to 64. If your types aren't recursive, it is highly unlikely cannam@62: that you would ever hit this limit, and even if they are recursive, it's still unlikely. cannam@62: cannam@62: * The traversal limit is designed to defend against maliciously-crafted messages which use cannam@62: pointer cycles or overlapping objects to make a message appear much larger than it looks off cannam@62: the wire. While cycles and overlapping objects are illegal, they are hard to detect reliably. cannam@62: Instead, Cap'n Proto places a limit on how many bytes worth of objects you can _dereference_ cannam@62: before it throws an exception. This limit is assessed every time you follow a pointer. By cannam@62: default, the limit is 64MiB (this may change in the future). `StreamFdMessageReader` will cannam@62: actually reject upfront any message which is larger than the traversal limit, even before you cannam@62: start reading it. cannam@62: cannam@62: If you need to write your code in such a way that you might frequently re-read the same cannam@62: pointers, instead of increasing the traversal limit to the point where it is no longer useful, cannam@62: consider simply copying the message into a new `MallocMessageBuilder` before starting. Then, cannam@62: the traversal limit will be enforced only during the copy. There is no traversal limit on cannam@62: objects once they live in a `MessageBuilder`, even if you use `.asReader()` to convert a cannam@62: particular object's builder to the corresponding reader type. cannam@62: cannam@62: Both limits may be increased using `capnp::ReaderOptions`, defined in `capnp/message.h`. cannam@62: cannam@62: * Remember that enums on the wire may have a numeric value that does not match any value defined cannam@62: in the schema. Your `switch()` statements must always have a safe default case. cannam@62: cannam@62: ## Lessons Learned from Protocol Buffers cannam@62: cannam@62: The author of Cap'n Proto's C++ implementation also wrote (in the past) verison 2 of Google's cannam@62: Protocol Buffers. As a result, Cap'n Proto's implementation benefits from a number of lessons cannam@62: learned the hard way: cannam@62: cannam@62: * Protobuf generated code is enormous due to the parsing and serializing code generated for every cannam@62: class. This actually poses a significant problem in practice -- there exist server binaries cannam@62: containing literally hundreds of megabytes of compiled protobuf code. Cap'n Proto generated code, cannam@62: on the other hand, is almost entirely inlined accessors. The only things that go into `.capnp.o` cannam@62: files are default values for pointer fields (if needed, which is rare) and the encoded schema cannam@62: (just the raw bytes of a Cap'n-Proto-encoded schema structure). The latter could even be removed cannam@62: if you don't use dynamic reflection. cannam@62: cannam@62: * The C++ Protobuf implementation used lots of dynamic initialization code (that runs before cannam@62: `main()`) to do things like register types in global tables. This proved problematic for cannam@62: programs which linked in lots of protocols but needed to start up quickly. Cap'n Proto does not cannam@62: use any dynamic initializers anywhere, period. cannam@62: cannam@62: * The C++ Protobuf implementation makes heavy use of STL in its interface and implementation. cannam@62: The proliferation of template instantiations gives the Protobuf runtime library a large footprint, cannam@62: and using STL in the interface can lead to weird ABI problems and slow compiles. Cap'n Proto cannam@62: does not use any STL containers in its interface and makes sparing use in its implementation. cannam@62: As a result, the Cap'n Proto runtime library is smaller, and code that uses it compiles quickly. cannam@62: cannam@62: * The in-memory representation of messages in Protobuf-C++ involves many heap objects. Each cannam@62: message (struct) is an object, each non-primitive repeated field allocates an array of pointers cannam@62: to more objects, and each string may actually add two heap objects. Cap'n Proto by its nature cannam@62: uses arena allocation, so the entire message is allocated in a few contiguous segments. This cannam@62: means Cap'n Proto spends very little time allocating memory, stores messages more compactly, and cannam@62: avoids memory fragmentation. cannam@62: cannam@62: * Related to the last point, Protobuf-C++ relies heavily on object reuse for performance. cannam@62: Building or parsing into a newly-allocated Protobuf object is significantly slower than using cannam@62: an existing one. However, the memory usage of a Protobuf object will tend to grow the more times cannam@62: it is reused, particularly if it is used to parse messages of many different "shapes", so the cannam@62: objects need to be deleted and re-allocated from time to time. All this makes tuning Protobufs cannam@62: fairly tedious. In contrast, enabling memory reuse with Cap'n Proto is as simple as providing cannam@62: a byte buffer to use as scratch space when you build or read in a message. Provide enough scratch cannam@62: space to hold the entire message and Cap'n Proto won't allocate any memory. Or don't -- since cannam@62: Cap'n Proto doesn't do much allocation in the first place, the benefits of scratch space are cannam@62: small.