Mercurial > hg > sv-dependency-builds

diff src/capnproto-git-20161025/doc/cxx.md @ 48:9530b331f8c1
Add Cap'n Proto source
author: Chris Cannam <cannam@all-day-breakfast.com>
date: Tue, 25 Oct 2016 11:17:01 +0100
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/src/capnproto-git-20161025/doc/cxx.md	Tue Oct 25 11:17:01 2016 +0100
@@ -0,0 +1,919 @@
+---
+layout: page
+title: C++ Serialization
+---
+
+# C++ Serialization
+
+The Cap'n Proto C++ runtime implementation provides an easy-to-use interface for manipulating
+messages backed by fast pointer arithmetic.  This page discusses the serialization layer of
+the runtime; see [C++ RPC](cxxrpc.html) for information about the RPC layer.
+
+## Example Usage
+
+For the Cap'n Proto definition:
+
+{% highlight capnp %}
+struct Person {
+  id @0 :UInt32;
+  name @1 :Text;
+  email @2 :Text;
+  phones @3 :List(PhoneNumber);
+
+  struct PhoneNumber {
+    number @0 :Text;
+    type @1 :Type;
+
+    enum Type {
+      mobile @0;
+      home @1;
+      work @2;
+    }
+  }
+
+  employment :union {
+    unemployed @4 :Void;
+    employer @5 :Text;
+    school @6 :Text;
+    selfEmployed @7 :Void;
+    # We assume that a person is only one of these.
+  }
+}
+
+struct AddressBook {
+  people @0 :List(Person);
+}
+{% endhighlight %}
+
+You might write code like:
+
+{% highlight c++ %}
+#include "addressbook.capnp.h"
+#include <capnp/message.h>
+#include <capnp/serialize-packed.h>
+#include <iostream>
+
+void writeAddressBook(int fd) {
+  ::capnp::MallocMessageBuilder message;
+
+  AddressBook::Builder addressBook = message.initRoot<AddressBook>();
+  ::capnp::List<Person>::Builder people = addressBook.initPeople(2);
+
+  Person::Builder alice = people[0];
+  alice.setId(123);
+  alice.setName("Alice");
+  alice.setEmail("alice@example.com");
+  // Type shown for explanation purposes; normally you'd use auto.
+  ::capnp::List<Person::PhoneNumber>::Builder alicePhones =
+      alice.initPhones(1);
+  alicePhones[0].setNumber("555-1212");
+  alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE);
+  alice.getEmployment().setSchool("MIT");
+
+  Person::Builder bob = people[1];
+  bob.setId(456);
+  bob.setName("Bob");
+  bob.setEmail("bob@example.com");
+  auto bobPhones = bob.initPhones(2);
+  bobPhones[0].setNumber("555-4567");
+  bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
+  bobPhones[1].setNumber("555-7654");
+  bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
+  bob.getEmployment().setUnemployed();
+
+  writePackedMessageToFd(fd, message);
+}
+
+void printAddressBook(int fd) {
+  ::capnp::PackedFdMessageReader message(fd);
+
+  AddressBook::Reader addressBook = message.getRoot<AddressBook>();
+
+  for (Person::Reader person : addressBook.getPeople()) {
+    std::cout << person.getName().cStr() << ": "
+              << person.getEmail().cStr() << std::endl;
+    for (Person::PhoneNumber::Reader phone: person.getPhones()) {
+      const char* typeName = "UNKNOWN";
+      switch (phone.getType()) {
+        case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break;
+        case Person::PhoneNumber::Type::HOME: typeName = "home"; break;
+        case Person::PhoneNumber::Type::WORK: typeName = "work"; break;
+      }
+      std::cout << "  " << typeName << " phone: "
+                << phone.getNumber().cStr() << std::endl;
+    }
+    Person::Employment::Reader employment = person.getEmployment();
+    switch (employment.which()) {
+      case Person::Employment::UNEMPLOYED:
+        std::cout << "  unemployed" << std::endl;
+        break;
+      case Person::Employment::EMPLOYER:
+        std::cout << "  employer: "
+                  << employment.getEmployer().cStr() << std::endl;
+        break;
+      case Person::Employment::SCHOOL:
+        std::cout << "  student at: "
+                  << employment.getSchool().cStr() << std::endl;
+        break;
+      case Person::Employment::SELF_EMPLOYED:
+        std::cout << "  self-employed" << std::endl;
+        break;
+    }
+  }
+}
+{% endhighlight %}
+
+## C++ Feature Usage:  C++11, Exceptions
+
+This implementation makes use of C++11 features.  If you are using GCC, you will need at least
+version 4.7 to compile Cap'n Proto.  If you are using Clang, you will need at least version 3.2.
+These compilers required the flag `-std=c++11` to enable C++11 features -- your code which
+`#include`s Cap'n Proto headers will need to be compiled with this flag.  Other compilers have not
+been tested at this time.
+
+This implementation prefers to handle errors using exceptions.  Exceptions are only used in
+circumstances that should never occur in normal operation.  For example, exceptions are thrown
+on assertion failures (indicating bugs in the code), network failures, and invalid input.
+Exceptions thrown by Cap'n Proto are never part of the interface and never need to be caught in
+correct usage.  The purpose of throwing exceptions is to allow higher-level code a chance to
+recover from unexpected circumstances without disrupting other work happening in the same process.
+For example, a server that handles requests from multiple clients should, on exception, return an
+error to the client that caused the exception and close that connection, but should continue
+handling other connections normally.
+
+When Cap'n Proto code might throw an exception from a destructor, it first checks
+`std::uncaught_exception()` to ensure that this is safe.  If another exception is already active,
+the new exception is assumed to be a side-effect of the main exception, and is either silently
+swallowed or reported on a side channel.
+
+In recognition of the fact that some teams prefer not to use exceptions, and that even enabling
+exceptions in the compiler introduces overhead, Cap'n Proto allows you to disable them entirely
+by registering your own exception callback.  The callback will be called in place of throwing an
+exception.  The callback may abort the process, and is required to do so in certain circumstances
+(when a fatal bug is detected).  If the callback returns normally, Cap'n Proto will attempt
+to continue by inventing "safe" values.  This will lead to garbage output, but at least the program
+will not crash.  Your exception callback should set some sort of a flag indicating that an error
+occurred, and somewhere up the stack you should check for that flag and cancel the operation.
+See the header `kj/exception.h` for details on how to register an exception callback.
+
+## KJ Library
+
+Cap'n Proto is built on top of a basic utility library called KJ.  The two were actually developed
+together -- KJ is simply the stuff which is not specific to Cap'n Proto serialization, and may be
+useful to others independently of Cap'n Proto.  For now, the the two are distributed together.  The
+name "KJ" has no particular meaning; it was chosen to be short and easy-to-type.
+
+As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library.  You may need
+to explicitly link against libraries:  `-lcapnp -lkj`
+
+## Generating Code
+
+To generate C++ code from your `.capnp` [interface definition](language.html), run:
+
+    capnp compile -oc++ myproto.capnp
+
+This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.
+
+To use this code in your app, you must link against both `libcapnp` and `libkj`.  If you use
+`pkg-config`, Cap'n Proto provides the `capnp` module to simplify discovery of compiler and linker
+flags.
+
+If you use [RPC](cxxrpc.html) (i.e., your schema defines [interfaces](language.html#interfaces)),
+then you will additionally nead to link against `libcapnp-rpc` and `libkj-async`, or use the
+`capnp-rpc` `pkg-config` module.
+
+### Setting a Namespace
+
+You probably want your generated types to live in a C++ namespace.  You will need to import
+`/capnp/c++.capnp` and use the `namespace` annotation it defines:
+
+{% highlight capnp %}
+using Cxx = import "/capnp/c++.capnp";
+$Cxx.namespace("foo::bar::baz");
+{% endhighlight %}
+
+Note that `capnp/c++.capnp` is installed in `$PREFIX/include` (`/usr/local/include` by default)
+when you install the C++ runtime.  The `capnp` tool automatically searches `/usr/include` and
+`/usr/local/include` for imports that start with a `/`, so it should "just work".  If you installed
+somewhere else, you may need to add it to the search path with the `-I` flag to `capnp compile`,
+which works much like the compiler flag of the same name.
+
+## Types
+
+### Primitive Types
+
+Primitive types map to the obvious C++ types:
+
+* `Bool` -> `bool`
+* `IntNN` -> `intNN_t`
+* `UIntNN` -> `uintNN_t`
+* `Float32` -> `float`
+* `Float64` -> `double`
+* `Void` -> `::capnp::Void` (An empty struct; its only value is `::capnp::VOID`)
+
+### Structs
+
+For each struct `Foo` in your interface, a C++ type named `Foo` generated.  This type itself is
+really just a namespace; it contains two important inner classes:  `Reader` and `Builder`.
+
+`Reader` represents a read-only instance of `Foo` while `Builder` represents a writable instance
+(usually, one that you are building).  Both classes behave like pointers, in that you can pass them
+by value and they do not own the underlying data that they operate on.  In other words,
+`Foo::Builder` is like a pointer to a `Foo` while `Foo::Reader` is like a const pointer to a `Foo`.
+
+For every field `bar` defined in `Foo`, `Foo::Reader` has a method `getBar()`.  For primitive types,
+`get` just returns the type, but for structs, lists, and blobs, it returns a `Reader` for the
+type.
+
+{% highlight c++ %}
+// Example Reader methods:
+
+// myPrimitiveField @0 :Int32;
+int32_t getMyPrimitiveField();
+
+// myTextField @1 :Text;
+::capnp::Text::Reader getMyTextField();
+// (Note that Text::Reader may be implicitly cast to const char* and
+// std::string.)
+
+// myStructField @2 :MyStruct;
+MyStruct::Reader getMyStructField();
+
+// myListField @3 :List(Float64);
+::capnp::List<double> getMyListField();
+{% endhighlight %}
+
+`Foo::Builder`, meanwhile, has several methods for each field `bar`:
+
+* `getBar()`:  For primitives, returns the value.  For composites, returns a Builder for the
+  composite.  If a composite field has not been initialized (i.e. this is the first time it has
+  been accessed), it will be initialized to a copy of the field's default value before returning.
+* `setBar(x)`:  For primitives, sets the value to x.  For composites, sets the value to a deep copy
+  of x, which must be a Reader for the type.
+* `initBar(n)`:  Only for lists and blobs.  Sets the field to a newly-allocated list or blob
+  of size n and returns a Builder for it.  The elements of the list are initialized to their empty
+  state (zero for numbers, default values for structs).
+* `initBar()`:  Only for structs.  Sets the field to a newly-allocated struct and returns a
+  Builder for it.  Note that the newly-allocated struct is initialized to the default value for
+  the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it
+  has one).
+* `hasBar()`:  Only for pointer fields (e.g. structs, lists, blobs).  Returns true if the pointer
+  has been initialized (non-null).  (This method is also available on readers.)
+* `adoptBar(x)`:  Only for pointer fields.  Adopts the orphaned object x, linking it into the field
+  `bar` without copying.  See the section on orphans.
+* `disownBar()`:  Disowns the value pointed to by `bar`, setting the pointer to null and returning
+  its previous value as an orphan.  See the section on orphans.
+
+{% highlight c++ %}
+// Example Builder methods:
+
+// myPrimitiveField @0 :Int32;
+int32_t getMyPrimitiveField();
+void setMyPrimitiveField(int32_t value);
+
+// myTextField @1 :Text;
+::capnp::Text::Builder getMyTextField();
+void setMyTextField(::capnp::Text::Reader value);
+::capnp::Text::Builder initMyTextField(size_t size);
+// (Note that Text::Reader is implicitly constructable from const char*
+// and std::string, and Text::Builder can be implicitly cast to
+// these types.)
+
+// myStructField @2 :MyStruct;
+MyStruct::Builder getMyStructField();
+void setMyStructField(MyStruct::Reader value);
+MyStruct::Builder initMyStructField();
+
+// myListField @3 :List(Float64);
+::capnp::List<double>::Builder getMyListField();
+void setMyListField(::capnp::List<double>::Reader value);
+::capnp::List<double>::Builder initMyListField(size_t size);
+{% endhighlight %}
+
+### Groups
+
+Groups look a lot like a combination of a nested type and a field of that type, except that you
+cannot set, adopt, or disown a group -- you can only get and init it.
+
+### Unions
+
+A named union (as opposed to an unnamed one) works just like a group, except with some additions:
+
+* For each field `foo`, the union reader and builder have a method `isFoo()` which returns true
+  if `foo` is the currently-set field in the union.
+* The union reader and builder also have a method `which()` that returns an enum value indicating
+  which field is currently set.
+* Calling the set, init, or adopt accessors for a field makes it the currently-set field.
+* Calling the get or disown accessors on a field that isn't currently set will throw an
+  exception in debug mode or return garbage when `NDEBUG` is defined.
+
+Unnamed unions differ from named unions only in that the accessor methods from the union's members
+are added directly to the containing type's reader and builder, rather than generating a nested
+type.
+
+See the [example](#example-usage) at the top of the page for an example of unions.
+
+### Lists
+
+Lists are represented by the type `capnp::List<T>`, where `T` is any of the primitive types,
+any Cap'n Proto user-defined type, `capnp::Text`, `capnp::Data`, or `capnp::List<U>`
+(to form a list of lists).
+
+The type `List<T>` itself is not instantiatable, but has two inner classes: `Reader` and `Builder`.
+As with structs, these types behave like pointers to read-only and read-write data, respectively.
+
+Both `Reader` and `Builder` implement `size()`, `operator[]`, `begin()`, and `end()`, as good C++
+containers should.  Note, though, that `operator[]` is read-only -- you cannot use it to assign
+the element, because that would require returning a reference, which is impossible because the
+underlying data may not be in your CPU's native format (e.g., wrong byte order).  Instead, to
+assign an element of a list, you must use `builder.set(index, value)`.
+
+For `List<Foo>` where `Foo` is a non-primitive type, the type returned by `operator[]` and
+`iterator::operator*()` is `Foo::Reader` (for `List<Foo>::Reader`) or `Foo::Builder`
+(for `List<Foo>::Builder`).  The builder's `set` method takes a `Foo::Reader` as its second
+parameter.
+
+For lists of lists or lists of blobs, the builder also has a method `init(index, size)` which sets
+the element at the given index to a newly-allocated value with the given size and returns a builder
+for it.  Struct lists do not have an `init` method because all elements are initialized to empty
+values when the list is created.
+
+### Enums
+
+Cap'n Proto enums become C++11 "enum classes".  That means they behave like any other enum, but
+the enum's values are scoped within the type.  E.g. for an enum `Foo` with value `bar`, you must
+refer to the value as `Foo::BAR`.
+
+To match prevaling C++ style, an enum's value names are converted to UPPERCASE_WITH_UNDERSCORES
+(whereas in the schema language you'd write them in camelCase).
+
+Keep in mind when writing `switch` blocks that an enum read off the wire may have a numeric
+value that is not listed in its definition.  This may be the case if the sender is using a newer
+version of the protocol, or if the message is corrupt or malicious.  In C++11, enums are allowed
+to have any value that is within the range of their base type, which for Cap'n Proto enums is
+`uint16_t`.
+
+### Blobs (Text and Data)
+
+Blobs are manipulated using the classes `capnp::Text` and `capnp::Data`.  These classes are,
+again, just containers for inner classes `Reader` and `Builder`.  These classes are iterable and
+implement `size()` and `operator[]` methods.  `Builder::operator[]` even returns a reference
+(unlike with `List<T>`).  `Text::Reader` additionally has a method `cStr()` which returns a
+NUL-terminated `const char*`.
+
+As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying
+type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format.  This is
+accomplished without actually `#include`ing `<string>`, since some clients do not want to rely
+on this rather-bulky header.  In fact, any class which defines a `.c_str()` method will be
+implicitly convertible in this way.  Unfortunately, this trick doesn't work on GCC 4.7.
+
+### Interfaces
+
+[Interfaces (RPC) have their own page.](cxxrpc.html)
+
+### Generics
+
+[Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose
+name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types
+are not, because they inherit the parameters from the outer type. Similarly, template parameters
+should refer to outer types, not `Reader` or `Builder` types.
+
+For example, given:
+
+{% highlight capnp %}
+struct Map(Key, Value) {
+  entries @0 :List(Entry);
+  struct Entry {
+    key @0 :Key;
+    value @1 :Value;
+  }
+}
+
+struct People {
+  byName @0 :Map(Text, Person);
+  # Maps names to Person instances.
+}
+{% endhighlight %}
+
+You might write code like:
+
+{% highlight c++ %}
+void processPeople(People::Reader people) {
+  Map<Text, Person>::Reader reader = people.getByName();
+  capnp::List<Map<Text, Person>::Entry>::Reader entries =
+      reader.getEntries()
+  for (auto entry: entries) {
+    processPerson(entry);
+  }
+}
+{% endhighlight %}
+
+Note that all template parameters will be specified with a default value of `AnyPointer`.
+Therefore, the type `Map<>` is equivalent to `Map<capnp::AnyPointer, capnp::AnyPointer>`.
+
+### Constants
+
+Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style
+(whereas in the schema language you’d write them in camelCase).  Primitive constants are just
+`constexpr` values.  Pointer-type constants (e.g. structs, lists, and blobs) are represented
+using a proxy object that can be converted to the relevant `Reader` type, either implicitly or
+using the unary `*` or `->` operators.
+
+## Messages and I/O
+
+To create a new message, you must start by creating a `capnp::MessageBuilder`
+(`capnp/message.h`).  This is an abstract type which you can implement yourself, but most users
+will want to use `capnp::MallocMessageBuilder`.  Once your message is constructed, write it to
+a file descriptor with `capnp::writeMessageToFd(fd, builder)` (`capnp/serialize.h`) or
+`capnp::writePackedMessageToFd(fd, builder)` (`capnp/serialize-packed.h`).
+
+To read a message, you must create a `capnp::MessageReader`, which is another abstract type.
+Implementations are specific to the data source.  You can use `capnp::StreamFdMessageReader`
+(`capnp/serialize.h`) or `capnp::PackedFdMessageReader` (`capnp/serialize-packed.h`)
+to read from file descriptors; both take the file descriptor as a constructor argument.
+
+Note that if your stream contains additional data after the message, `PackedFdMessageReader` may
+accidentally read some of that data, since it does buffered I/O.  To make this work correctly, you
+will need to set up a multi-use buffered stream.  Buffered I/O may also be a good idea with
+`StreamFdMessageReader` and also when writing, for performance reasons.  See `capnp/io.h` for
+details.
+
+There is an [example](#example-usage) of all this at the beginning of this page.
+
+### Using mmap
+
+Cap'n Proto can be used together with `mmap()` (or Win32's `MapViewOfFile()`) for extremely fast
+reads, especially when you only need to use a subset of the data in the file.  Currently,
+Cap'n Proto is not well-suited for _writing_ via `mmap()`, only reading, but this is only because
+we have not yet invented a mutable segment framing format -- the underlying design should
+eventually work for both.
+
+To take advantage of `mmap()` at read time, write your file in regular serialized (but NOT packed)
+format -- that is, use `writeMessageToFd()`, _not_ `writePackedMessageToFd()`.  Now, `mmap()` in
+the entire file, and then pass the mapped memory to the constructor of
+`capnp::FlatArrayMessageReader` (defined in `capnp/serialize.h`).  That's it.  You can use the
+reader just like a normal `StreamFdMessageReader`.  The operating system will automatically page
+in data from disk as you read it.
+
+`mmap()` works best when reading from flash media, or when the file is already hot in cache.
+It works less well with slow rotating disks.  Here, disk seeks make random access relatively
+expensive.  Also, if I/O throughput is your bottleneck, then the fact that mmaped data cannot
+be packed or compressed may hurt you.  However, it all depends on what fraction of the file you're
+actually reading -- if you only pull one field out of one deeply-nested struct in a huge tree, it
+may still be a win.  The only way to know for sure is to do benchmarks!  (But be careful to make
+sure your benchmark is actually interacting with disk and not cache.)
+
+## Dynamic Reflection
+
+Sometimes you want to write generic code that operates on arbitrary types, iterating over the
+fields or looking them up by name.  For example, you might want to write code that encodes
+arbitrary Cap'n Proto types in JSON format.  This requires something like "reflection", but C++
+does not offer reflection.  Also, you might even want to operate on types that aren't compiled
+into the binary at all, but only discovered at runtime.
+
+The C++ API supports inspecting schemas at runtime via the interface defined in
+`capnp/schema.h`, and dynamically reading and writing instances of arbitrary types via
+`capnp/dynamic.h`.  Here's the example from the beginning of this file rewritten in terms
+of the dynamic API:
+
+{% highlight c++ %}
+#include "addressbook.capnp.h"
+#include <capnp/message.h>
+#include <capnp/serialize-packed.h>
+#include <iostream>
+#include <capnp/schema.h>
+#include <capnp/dynamic.h>
+
+using ::capnp::DynamicValue;
+using ::capnp::DynamicStruct;
+using ::capnp::DynamicEnum;
+using ::capnp::DynamicList;
+using ::capnp::List;
+using ::capnp::Schema;
+using ::capnp::StructSchema;
+using ::capnp::EnumSchema;
+
+using ::capnp::Void;
+using ::capnp::Text;
+using ::capnp::MallocMessageBuilder;
+using ::capnp::PackedFdMessageReader;
+
+void dynamicWriteAddressBook(int fd, StructSchema schema) {
+  // Write a message using the dynamic API to set each
+  // field by text name.  This isn't something you'd
+  // normally want to do; it's just for illustration.
+
+  MallocMessageBuilder message;
+
+  // Types shown for explanation purposes; normally you'd
+  // use auto.
+  DynamicStruct::Builder addressBook =
+      message.initRoot<DynamicStruct>(schema);
+
+  DynamicList::Builder people =
+      addressBook.init("people", 2).as<DynamicList>();
+
+  DynamicStruct::Builder alice =
+      people[0].as<DynamicStruct>();
+  alice.set("id", 123);
+  alice.set("name", "Alice");
+  alice.set("email", "alice@example.com");
+  auto alicePhones = alice.init("phones", 1).as<DynamicList>();
+  auto phone0 = alicePhones[0].as<DynamicStruct>();
+  phone0.set("number", "555-1212");
+  phone0.set("type", "mobile");
+  alice.get("employment").as<DynamicStruct>()
+       .set("school", "MIT");
+
+  auto bob = people[1].as<DynamicStruct>();
+  bob.set("id", 456);
+  bob.set("name", "Bob");
+  bob.set("email", "bob@example.com");
+
+  // Some magic:  We can convert a dynamic sub-value back to
+  // the native type with as<T>()!
+  List<Person::PhoneNumber>::Builder bobPhones =
+      bob.init("phones", 2).as<List<Person::PhoneNumber>>();
+  bobPhones[0].setNumber("555-4567");
+  bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
+  bobPhones[1].setNumber("555-7654");
+  bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
+  bob.get("employment").as<DynamicStruct>()
+     .set("unemployed", ::capnp::VOID);
+
+  writePackedMessageToFd(fd, message);
+}
+
+void dynamicPrintValue(DynamicValue::Reader value) {
+  // Print an arbitrary message via the dynamic API by
+  // iterating over the schema.  Look at the handling
+  // of STRUCT in particular.
+
+  switch (value.getType()) {
+    case DynamicValue::VOID:
+      std::cout << "";
+      break;
+    case DynamicValue::BOOL:
+      std::cout << (value.as<bool>() ? "true" : "false");
+      break;
+    case DynamicValue::INT:
+      std::cout << value.as<int64_t>();
+      break;
+    case DynamicValue::UINT:
+      std::cout << value.as<uint64_t>();
+      break;
+    case DynamicValue::FLOAT:
+      std::cout << value.as<double>();
+      break;
+    case DynamicValue::TEXT:
+      std::cout << '\"' << value.as<Text>().cStr() << '\"';
+      break;
+    case DynamicValue::LIST: {
+      std::cout << "[";
+      bool first = true;
+      for (auto element: value.as<DynamicList>()) {
+        if (first) {
+          first = false;
+        } else {
+          std::cout << ", ";
+        }
+        dynamicPrintValue(element);
+      }
+      std::cout << "]";
+      break;
+    }
+    case DynamicValue::ENUM: {
+      auto enumValue = value.as<DynamicEnum>();
+      KJ_IF_MAYBE(enumerant, enumValue.getEnumerant()) {
+        std::cout <<
+            enumerant->getProto().getName().cStr();
+      } else {
+        // Unknown enum value; output raw number.
+        std::cout << enumValue.getRaw();
+      }
+      break;
+    }
+    case DynamicValue::STRUCT: {
+      std::cout << "(";
+      auto structValue = value.as<DynamicStruct>();
+      bool first = true;
+      for (auto field: structValue.getSchema().getFields()) {
+        if (!structValue.has(field)) continue;
+        if (first) {
+          first = false;
+        } else {
+          std::cout << ", ";
+        }
+        std::cout << field.getProto().getName().cStr()
+                  << " = ";
+        dynamicPrintValue(structValue.get(field));
+      }
+      std::cout << ")";
+      break;
+    }
+    default:
+      // There are other types, we aren't handling them.
+      std::cout << "?";
+      break;
+  }
+}
+
+void dynamicPrintMessage(int fd, StructSchema schema) {
+  PackedFdMessageReader message(fd);
+  dynamicPrintValue(message.getRoot<DynamicStruct>(schema));
+  std::cout << std::endl;
+}
+{% endhighlight %}
+
+Notes about the dynamic API:
+
+* You can implicitly cast any compiled Cap'n Proto struct reader/builder type directly to
+  `DynamicStruct::Reader`/`DynamicStruct::Builder`.  Similarly with `List<T>` and `DynamicList`,
+  and even enum types and `DynamicEnum`.  Finally, all valid Cap'n Proto field types may be
+  implicitly converted to `DynamicValue`.
+
+* You can load schemas dynamically at runtime using `SchemaLoader` (`capnp/schema-loader.h`) and
+  use the Dynamic API to manipulate objects of these types.  `MessageBuilder` and `MessageReader`
+  have methods for accessing the message root using a dynamic schema.
+
+* While `SchemaLoader` loads binary schemas, you can also parse directly from text using
+  `SchemaParser` (`capnp/schema-parser.h`).  However, this requires linking against `libcapnpc`
+  (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient.  If
+  you can arrange to use only binary schemas at runtime, you'll be better off.
+
+* Unlike with Protobufs, there is no "global registry" of compiled-in types.  To get the schema
+  for a compiled-in type, use `capnp::Schema::from<MyType>()`.
+
+* Unlike with Protobufs, the overhead of supporting reflection is small.  Generated `.capnp.c++`
+  files contain only some embedded const data structures describing the schema, no code at all,
+  and the runtime library support code is relatively small.  Moreover, if you do not use the
+  dynamic API or the schema API, you do not even need to link their implementations into your
+  executable.
+
+* The dynamic API performs type checks at runtime.  In case of error, it will throw an exception.
+  If you compile with `-fno-exceptions`, it will crash instead.  Correct usage of the API should
+  never throw, but bugs happen.  Enabling and catching exceptions will make your code more robust.
+
+* Loading user-provided schemas has security implications: it greatly increases the attack
+  surface of the Cap'n Proto library.  In particular, it is easy for an attacker to trigger
+  exceptions.  To protect yourself, you are strongly advised to enable exceptions and catch them.
+
+## Orphans
+
+An "orphan" is a Cap'n Proto object that is disconnected from the message structure.  That is,
+it is not the root of a message, and there is no other Cap'n Proto object holding a pointer to it.
+Thus, it has no parents.  Orphans are an advanced feature that can help avoid copies and make it
+easier to use Cap'n Proto objects as part of your application's internal state.  Typical
+applications probably won't use orphans.
+
+The class `capnp::Orphan<T>` (defined in `<capnp/orphan.h>`) represents a pointer to an orphaned
+object of type `T`.  `T` can be any struct type, `List<T>`, `Text`, or `Data`.  E.g.
+`capnp::Orphan<Person>` would be an orphaned `Person` structure.  `Orphan<T>` is a move-only class,
+similar to `std::unique_ptr<T>`.  This prevents two different objects from adopting the same
+orphan, which would result in an invalid message.
+
+An orphan can be "adopted" by another object to link it into the message structure.  Conversely,
+an object can "disown" one of its pointers, causing the pointed-to object to become an orphan.
+Every pointer-typed field `foo` provides builder methods `adoptFoo()` and `disownFoo()` for these
+purposes.  Again, these methods use C++11 move semantics.  To use them, you will need to be
+familiar with `std::move()` (or the equivalent but shorter-named `kj::mv()`).
+
+Even though an orphan is unlinked from the message tree, it still resides inside memory allocated
+for a particular message (i.e. a particular `MessageBuilder`).  An orphan can only be adopted by
+objects that live in the same message.  To move objects between messages, you must perform a copy.
+If the message is serialized while an `Orphan<T>` living within it still exists, the orphan's
+content will be part of the serialized message, but the only way the receiver could find it is by
+investigating the raw message; the Cap'n Proto API provides no way to detect or read it.
+
+To construct an orphan from scratch (without having some other object disown it), you need an
+`Orphanage`, which is essentially an orphan factory associated with some message.  You can get one
+by calling the `MessageBuilder`'s `getOrphanage()` method, or by calling the static method
+`Orphanage::getForMessageContaining(builder)` and passing it any struct or list builder.
+
+Note that when an `Orphan<T>` goes out-of-scope without being adopted, the underlying memory that
+it occupied is overwritten with zeros.  If you use packed serialization, these zeros will take very
+little bandwidth on the wire, but will still waste memory on the sending and receiving ends.
+Generally, you should avoid allocating message objects that won't be used, or if you cannot avoid
+it, arrange to copy the entire message over to a new `MessageBuilder` before serializing, since
+only the reachable objects will be copied.
+
+## Reference
+
+The runtime library contains lots of useful features not described on this page.  For now, the
+best reference is the header files.  See:
+
+    capnp/list.h
+    capnp/blob.h
+    capnp/message.h
+    capnp/serialize.h
+    capnp/serialize-packed.h
+    capnp/schema.h
+    capnp/schema-loader.h
+    capnp/dynamic.h
+
+## Tips and Best Practices
+
+Here are some tips for using the C++ Cap'n Proto runtime most effectively:
+
+* Accessor methods for primitive (non-pointer) fields are fast and inline.  They should be just
+  as fast as accessing a struct field through a pointer.
+
+* Accessor methods for pointer fields, on the other hand, are not inline, as they need to validate
+  the pointer.  If you intend to access the same pointer multiple times, it is a good idea to
+  save the value to a local variable to avoid repeating this work.  This is generally not a
+  problem given C++11's `auto`.
+
+  Example:
+
+      // BAD
+      frob(foo.getBar().getBaz(),
+           foo.getBar().getQux(),
+           foo.getBar().getCorge());
+
+      // GOOD
+      auto bar = foo.getBar();
+      frob(bar.getBaz(), bar.getQux(), bar.getCorge());
+
+  It is especially important to use this style when reading messages, for another reason:  as
+  described under the "security tips" section, below, every time you `get` a pointer, Cap'n Proto
+  increments a counter by the size of the target object.  If that counter hits a pre-defined limit,
+  an exception is thrown (or a default value is returned, if exceptions are disabled), to prevent
+  a malicious client from sending your server into an infinite loop with a specially-crafted
+  message.  If you repeatedly `get` the same object, you are repeatedly counting the same bytes,
+  and so you may hit the limit prematurely.  (Since Cap'n Proto readers are backed directly by
+  the underlying message buffer and do not have anywhere else to store per-object information, it
+  is impossible to remember whether you've seen a particular object already.)
+
+* Internally, all pointer fields start out "null", even if they have default values.  When you have
+  a pointer field `foo` and you call `getFoo()` on the containing struct's `Reader`, if the field
+  is "null", you will receive a reader for that field's default value.  This reader is backed by
+  read-only memory; nothing is allocated.  However, when you call `get` on a _builder_, and the
+  field is null, then the implementation must make a _copy_ of the default value to return to you.
+  Thus, you've caused the field to become non-null, just by "reading" it.  On the other hand, if
+  you call `init` on that field, you are explicitly replacing whatever value is already there
+  (null or not) with a newly-allocated instance, and that newly-allocated instance is _not_ a
+  copy of the field's default value, but just a completely-uninitialized instance of the
+  appropriate type.
+
+* It is possible to receive a struct value constructed from a newer version of the protocol than
+  the one your binary was built with, and that struct might have extra fields that you don't know
+  about.  The Cap'n Proto implementation tries to avoid discarding this extra data.  If you copy
+  the struct from one message to another (e.g. by calling a set() method on a parent object), the
+  extra fields will be preserved.  This makes it possible to build proxies that receive messages
+  and forward them on without having to rebuild the proxy every time a new field is added.  You
+  must be careful, however:  in some cases, it's not possible to retain the extra fields, because
+  they need to be copied into a space that is allocated before the expected content is known.
+  In particular, lists of structs are represented as a flat array, not as an array of pointers.
+  Therefore, all memory for all structs in the list must be allocated upfront.  Hence, copying
+  a struct value from another message into an element of a list will truncate the value.  Because
+  of this, the setter method for struct lists is called `setWithCaveats()` rather than just `set()`.
+
+* Messages are built in "arena" or "region" style:  each object is allocated sequentially in
+  memory, until there is no more room in the segment, in which case a new segment is allocated,
+  and objects continue to be allocated sequentially in that segment.  This design is what makes
+  Cap'n Proto possible at all, and it is very fast compared to other allocation strategies.
+  However, it has the disadvantage that if you allocate an object and then discard it, that memory
+  is lost.  In fact, the empty space will still become part of the serialized message, even though
+  it is unreachable.  The implementation will try to zero it out, so at least it should pack well,
+  but it's still better to avoid this situation.  Some ways that this can happen include:
+  * If you `init` a field that is already initialized, the previous value is discarded.
+  * If you create an orphan that is never adopted into the message tree.
+  * If you use `adoptWithCaveats` to adopt an orphaned struct into a struct list, then a shallow
+    copy is necessary, since the struct list requires that its elements are sequential in memory.
+    The previous copy of the struct is discarded (although child objects are transferred properly).
+  * If you copy a struct value from another message using a `set` method, the copy will have the
+    same size as the original.  However, the original could have been built with an older version
+    of the protocol which lacked some fields compared to the version your program was built with.
+    If you subsequently `get` that struct, the implementation will be forced to allocate a new
+    (shallow) copy which is large enough to hold all known fields, and the old copy will be
+    discarded.  Child objects will be transferred over without being copied -- though they might
+    suffer from the same problem if you `get` them later on.
+  Sometimes, avoiding these problems is too inconvenient.  Fortunately, it's also possible to
+  clean up the mess after-the-fact:  if you copy the whole message tree into a fresh
+  `MessageBuilder`, only the reachable objects will be copied, leaving out all of the unreachable
+  dead space.
+
+  In the future, Cap'n Proto may be improved such that it can re-use dead space in a message.
+  However, this will only improve things, not fix them entirely: fragementation could still leave
+  dead space.
+
+### Build Tips
+
+* If you are worried about the binary footprint of the Cap'n Proto library, consider statically
+  linking with the `--gc-sections` linker flag.  This will allow the linker to drop pieces of the
+  library that you do not actually use.  For example, many users do not use the dynamic schema and
+  reflection APIs, which contribute a large fraction of the Cap'n Proto library's overall
+  footprint.  Keep in mind that if you ever stringify a Cap'n Proto type, the stringification code
+  depends on the dynamic API; consider only using stringification in debug builds.
+
+  If you are dynamically linking against the system's shared copy of `libcapnp`, don't worry about
+  its binary size.  Remember that only the code which you actually use will be paged into RAM, and
+  those pages are shared with other applications on the system.
+
+  Also remember to strip your binary.  In particular, `libcapnpc` (the schema parser) has
+  excessively large symbol names caused by its use of template-based parser combinators.  Stripping
+  the binary greatly reduces its size.
+
+* The Cap'n Proto library has lots of debug-only asserts that are removed if you `#define NDEBUG`,
+  including in headers.  If you care at all about performance, you should compile your production
+  binaries with the `-DNDEBUG` compiler flag.  In fact, if Cap'n Proto detects that you have
+  optimization enabled but have not defined `NDEBUG`, it will define it for you (with a warning),
+  unless you define `DEBUG` or `KJ_DEBUG` to explicitly request debugging.
+
+### Security Tips
+
+Cap'n Proto has not yet undergone security review.  It most likely has some vulnerabilities.  You
+should not attempt to decode Cap'n Proto messages from sources you don't trust at this time.
+
+However, assuming the Cap'n Proto implementation hardens up eventually, then the following security
+tips will apply.
+
+* It is highly recommended that you enable exceptions.  When compiled with `-fno-exceptions`,
+  Cap'n Proto categorizes exceptions into "fatal" and "recoverable" varieties.  Fatal exceptions
+  cause the server to crash, while recoverable exceptions are handled by logging an error and
+  returning a "safe" garbage value.  Fatal is preferred in cases where it's unclear what kind of
+  garbage value would constitute "safe".  The more of the library you use, the higher the chance
+  that you will leave yourself open to the possibility that an attacker could trigger a fatal
+  exception somewhere.  If you enable exceptions, then you can catch the exception instead of
+  crashing, and return an error just to the attacker rather than to everyone using your server.
+
+  Basic parsing of Cap'n Proto messages shouldn't ever trigger fatal exceptions (assuming the
+  implementation is not buggy).  However, the dynamic API -- especially if you are loading schemas
+  controlled by the attacker -- is much more exception-happy.  If you cannot use exceptions, then
+  you are advised to avoid the dynamic API when dealing with untrusted data.
+
+* If you need to process schemas from untrusted sources, take them in binary format, not text.
+  The text parser is a much larger attack surface and not designed to be secure.  For instance,
+  as of this writing, it is trivial to deadlock the parser by simply writing a constant whose value
+  depends on itself.
+
+* Cap'n Proto automatically applies two artificial limits on messages for security reasons:
+  a limit on nesting dept, and a limit on total bytes traversed.
+
+  * The nesting depth limit is designed to prevent stack overflow when handling a deeply-nested
+    recursive type, and defaults to 64.  If your types aren't recursive, it is highly unlikely
+    that you would ever hit this limit, and even if they are recursive, it's still unlikely.
+
+  * The traversal limit is designed to defend against maliciously-crafted messages which use
+    pointer cycles or overlapping objects to make a message appear much larger than it looks off
+    the wire.  While cycles and overlapping objects are illegal, they are hard to detect reliably.
+    Instead, Cap'n Proto places a limit on how many bytes worth of objects you can _dereference_
+    before it throws an exception.  This limit is assessed every time you follow a pointer.  By
+    default, the limit is 64MiB (this may change in the future).  `StreamFdMessageReader` will
+    actually reject upfront any message which is larger than the traversal limit, even before you
+    start reading it.
+
+    If you need to write your code in such a way that you might frequently re-read the same
+    pointers, instead of increasing the traversal limit to the point where it is no longer useful,
+    consider simply copying the message into a new `MallocMessageBuilder` before starting.  Then,
+    the traversal limit will be enforced only during the copy.  There is no traversal limit on
+    objects once they live in a `MessageBuilder`, even if you use `.asReader()` to convert a
+    particular object's builder to the corresponding reader type.
+
+  Both limits may be increased using `capnp::ReaderOptions`, defined in `capnp/message.h`.
+
+* Remember that enums on the wire may have a numeric value that does not match any value defined
+  in the schema.  Your `switch()` statements must always have a safe default case.
+
+## Lessons Learned from Protocol Buffers
+
+The author of Cap'n Proto's C++ implementation also wrote (in the past) verison 2 of Google's
+Protocol Buffers.  As a result, Cap'n Proto's implementation benefits from a number of lessons
+learned the hard way:
+
+* Protobuf generated code is enormous due to the parsing and serializing code generated for every
+  class.  This actually poses a significant problem in practice -- there exist server binaries
+  containing literally hundreds of megabytes of compiled protobuf code.  Cap'n Proto generated code,
+  on the other hand, is almost entirely inlined accessors.  The only things that go into `.capnp.o`
+  files are default values for pointer fields (if needed, which is rare) and the encoded schema
+  (just the raw bytes of a Cap'n-Proto-encoded schema structure).  The latter could even be removed
+  if you don't use dynamic reflection.
+
+* The C++ Protobuf implementation used lots of dynamic initialization code (that runs before
+  `main()`) to do things like register types in global tables.  This proved problematic for
+  programs which linked in lots of protocols but needed to start up quickly.  Cap'n Proto does not
+  use any dynamic initializers anywhere, period.
+
+* The C++ Protobuf implementation makes heavy use of STL in its interface and implementation.
+  The proliferation of template instantiations gives the Protobuf runtime library a large footprint,
+  and using STL in the interface can lead to weird ABI problems and slow compiles.  Cap'n Proto
+  does not use any STL containers in its interface and makes sparing use in its implementation.
+  As a result, the Cap'n Proto runtime library is smaller, and code that uses it compiles quickly.
+
+* The in-memory representation of messages in Protobuf-C++ involves many heap objects.  Each
+  message (struct) is an object, each non-primitive repeated field allocates an array of pointers
+  to more objects, and each string may actually add two heap objects.  Cap'n Proto by its nature
+  uses arena allocation, so the entire message is allocated in a few contiguous segments.  This
+  means Cap'n Proto spends very little time allocating memory, stores messages more compactly, and
+  avoids memory fragmentation.
+
+* Related to the last point, Protobuf-C++ relies heavily on object reuse for performance.
+  Building or parsing into a newly-allocated Protobuf object is significantly slower than using
+  an existing one.  However, the memory usage of a Protobuf object will tend to grow the more times
+  it is reused, particularly if it is used to parse messages of many different "shapes", so the
+  objects need to be deleted and re-allocated from time to time.  All this makes tuning Protobufs
+  fairly tedious.  In contrast, enabling memory reuse with Cap'n Proto is as simple as providing
+  a byte buffer to use as scratch space when you build or read in a message.  Provide enough scratch
+  space to hold the entire message and Cap'n Proto won't allocate any memory.  Or don't -- since
+  Cap'n Proto doesn't do much allocation in the first place, the benefits of scratch space are
+  small.
author	Chris Cannam <cannam@all-day-breakfast.com>
date	Tue, 25 Oct 2016 11:17:01 +0100
parents
children