sv-dependency-builds: src/capnproto-git-20161025/doc/faq.md annotate

annotate src/capnproto-git-20161025/doc/faq.md @ 149:279b18cc7785

Update Win32 capnp builds to v0.6

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Tue, 23 May 2017 09:16:54 +0100
parents	1ac99bfc383d
children

rev	line source
cannam@133	1 ---
cannam@133	2 layout: page
cannam@133	3 title: FAQ
cannam@133	4 ---
cannam@133	5
cannam@133	6 # FAQ
cannam@133	7
cannam@133	8 ## Design
cannam@133	9
cannam@133	10 ### Isn't I/O bandwidth more important than CPU usage? Is Cap'n Proto barking up the wrong tree?
cannam@133	11
cannam@133	12 It depends. What is your use case?
cannam@133	13
cannam@133	14 Are you communicating between two processes on the same machine? If so, you have unlimited
cannam@133	15 bandwidth, and you should be entirely concerned with CPU.
cannam@133	16
cannam@133	17 Are you communicating between two machines within the same datacenter? If so, it's unlikely that
cannam@133	18 you will saturate your network connection before your CPU. Possible, but unlikely.
cannam@133	19
cannam@133	20 Are you communicating across the general internet? In that case, bandwidth is probably your main
cannam@133	21 concern. Luckily, Cap'n Proto lets you choose to enable "packing" in this case, achieving similar
cannam@133	22 encoding size to Protocol Buffers while still being faster. And you can always add extra
cannam@133	23 compression on top of that.
cannam@133	24
cannam@133	25 ### Have you considered building the RPC system on ZeroMQ?
cannam@133	26
cannam@133	27 ZeroMQ (and its successor, Nanomsg) is a powerful technology for distributed computing. Its
cannam@133	28 design focuses on scenarios involving lots of stateless, fault-tolerant worker processes
cannam@133	29 communicating via various patterns, such as request/response, produce/consume, and
cannam@133	30 publish/subscribe. For big data processing where armies of stateless nodes make sense, pairing
cannam@133	31 Cap'n Proto with ZeroMQ would be an excellent choice -- and this is easy to do today, as ZeroMQ
cannam@133	32 is entirely serialization-agnostic.
cannam@133	33
cannam@133	34 That said, Cap'n Proto RPC takes a very different approach. Cap'n Proto's model focuses on
cannam@133	35 stateful servers interacting in complex, object-oriented ways. The model is better suited to
cannam@133	36 tasks involving applications with many heterogeneous components and interactions between
cannam@133	37 mutually-distrusting parties. Requests and responses can go in any direction. Objects have
cannam@133	38 state and so two calls to the same object had best go to the same machine. Load balancing and
cannam@133	39 fault tolerance is pushed up the stack, because without a large pool of homogeneous work there's
cannam@133	40 just no way to make them transparent at a low level.
cannam@133	41
cannam@133	42 Put concretely, you might build a search engine indexing pipeline on ZeroMQ, but an online
cannam@133	43 interactive spreadsheet editor would be better built on Cap'n Proto RPC.
cannam@133	44
cannam@133	45 (Actually, a distributed programming framework providing similar features to ZeroMQ could itself be
cannam@133	46 built on top of Cap'n Proto RPC.)
cannam@133	47
cannam@133	48 ### Aren't messages that contain pointers a huge security problem?
cannam@133	49
cannam@133	50 Not at all. Cap'n Proto bounds-checks each pointer when it is read and throws an exception or
cannam@133	51 returns a safe dummy value (your choice) if the pointer is out-of-bounds.
cannam@133	52
cannam@133	53 ### So it's not that you've eliminated parsing, you've just moved it to happen lazily?
cannam@133	54
cannam@133	55 No. Compared to Protobuf decoding, the time spent validating pointers while traversing a Cap'n
cannam@133	56 Proto message is negligible.
cannam@133	57
cannam@133	58 ### I think I heard somewhere that capability-based security doesn't work?
cannam@133	59
cannam@133	60 This was a popular myth in security circles way back in the 80's and 90's, based on an incomplete
cannam@133	61 understanding of what capabilities are and how to use them effectively. Read
cannam@133	62 [Capability Myths Demolished](http://zesty.ca/capmyths/usenix.pdf). (No really, read it;
cannam@133	63 it's awesome.)
cannam@133	64
cannam@133	65 ## Usage
cannam@133	66
cannam@133	67 ### How do I make a field "required", like in Protocol Buffers?
cannam@133	68
cannam@133	69 You don't. You may find this surprising, but the "required" keyword in Protocol Buffers turned
cannam@133	70 out to be a horrible mistake.
cannam@133	71
cannam@133	72 For background, in protocol buffers, a field could be marked "required" to indicate that parsing
cannam@133	73 should fail if the sender forgot to set the field before sending the message. Required fields were
cannam@133	74 encoded exactly the same as optional ones; the only difference was the extra validation.
cannam@133	75
cannam@133	76 The problem with this is, validation is sometimes more subtle than that. Sometimes, different
cannam@133	77 applications -- or different parts of the same application, or different versions of the same
cannam@133	78 application -- place different requirements on the same protocol. An application may want to
cannam@133	79 pass around partially-complete messages internally. A particular field that used to be required
cannam@133	80 might become optional. A new use case might call for almost exactly the same message type, minus
cannam@133	81 one field, at which point it may make more sense to reuse the type than to define a new one.
cannam@133	82
cannam@133	83 A field declared required, unfortunately, is required everywhere. The validation is baked into
cannam@133	84 the parser, and there's nothing you can do about it. Nothing, that is, except change the field
cannam@133	85 from "required" to "optional". But that's where the _real_ problems start.
cannam@133	86
cannam@133	87 Imagine a production environment in which two servers, Alice and Bob, exchange messages through a
cannam@133	88 message bus infrastructure running on a big corporate network. The message bus parses each message
cannam@133	89 just to examine the envelope and decide how to route it, without paying attention to any other
cannam@133	90 content. Often, messages from various applications are batched together and then split up again
cannam@133	91 downstream.
cannam@133	92
cannam@133	93 Now, at some point, Alice's developers decide that one of the fields in a deeply-nested message
cannam@133	94 commonly sent to Bob has become obsolete. To clean things up, they decide to remove it, so they
cannam@133	95 change the field from "required" to "optional". The developers aren't idiots, so they realize that
cannam@133	96 Bob needs to be updated as well. They make the changes to Bob, and just to be thorough they
cannam@133	97 run an integration test with Alice and Bob running in a test environment. The test environment
cannam@133	98 is always running the latest build of the message bus, but that's irrelevant anyway because the
cannam@133	99 message bus doesn't actually care about message contents; it only does routing. Protocols are
cannam@133	100 modified all the time without updating the message bus.
cannam@133	101
cannam@133	102 Satisfied with their testing, the devs push a new version of Alice to prod. Immediately,
cannam@133	103 everything breaks. And by "everything" I don't just mean Alice and Bob. Completely unrelated
cannam@133	104 servers are getting strange errors or failing to receive messages. The whole data center has
cannam@133	105 ground to a halt and the sysadmins are running around with their hair on fire.
cannam@133	106
cannam@133	107 What happened? Well, the message bus running in prod was still an older build from before the
cannam@133	108 protocol change. And even though the message bus doesn't care about message content, it _does_
cannam@133	109 need to parse every message just to read the envelope. And the protobuf parser checks the _entire_
cannam@133	110 message for missing required fields. So when Alice stopped sending that newly-optional field, the
cannam@133	111 whole message failed to parse, envelope and all. And to make matters worse, any other messages
cannam@133	112 that happened to be in the same batch _also_ failed to parse, causing errors in seemingly-unrelated
cannam@133	113 systems that share the bus.
cannam@133	114
cannam@133	115 Things like this have actually happened. At Google. Many times.
cannam@133	116
cannam@133	117 The right answer is for applications to do validation as-needed in application-level code. If you
cannam@133	118 want to detect when a client fails to set a particular field, give the field an invalid default
cannam@133	119 value and then check for that value on the server. Low-level infrastructure that doesn't care
cannam@133	120 about message content should not validate it at all.
cannam@133	121
cannam@133	122 Oh, and also, Cap'n Proto doesn't have any parsing step during which to check for required
cannam@133	123 fields. :)
cannam@133	124
cannam@133	125 ### How do I make a field optional?
cannam@133	126
cannam@133	127 Cap'n Proto has no notion of "optional" fields.
cannam@133	128
cannam@133	129 A primitive field always takes space on the wire whether you set it or not (although default-valued
cannam@133	130 fields will be compressed away if you enable packing). Such a field can be made semantically
cannam@133	131 optional by placing it in a union with a `Void` field:
cannam@133	132
cannam@133	133 {% highlight capnp %}
cannam@133	134 union {
cannam@133	135 age @0 :Int32;
cannam@133	136 ageUnknown @1 :Void;
cannam@133	137 }
cannam@133	138 {% endhighlight %}
cannam@133	139
cannam@133	140 However, this field still takes space on the wire, and in fact takes an extra 16 bits of space
cannam@133	141 for the union tag. A better approach may be to give the field a bogus default value and interpret
cannam@133	142 that value to mean "not present".
cannam@133	143
cannam@133	144 Pointer fields are a bit different. They start out "null", and you can check for nullness using
cannam@133	145 the `hasFoo()` accessor. You could use a null pointer to mean "not present". Note, though, that
cannam@133	146 calling `getFoo()` on a null pointer returns the default value, which is indistinguishable from a
cannam@133	147 legitimate value, so checking `hasFoo()` is in fact the _only_ way to detect nullness.
cannam@133	148
cannam@133	149 ### How do I resize a list?
cannam@133	150
cannam@133	151 Unfortunately, you can't. You have to know the size of your list upfront, before you initialize
cannam@133	152 any of the elements. This is an annoying side effect of arena allocation, which is a fundamental
cannam@133	153 part of Cap'n Proto's design: in order to avoid making a copy later, all of the pieces of the
cannam@133	154 message must be allocated in a tightly-packed segment of memory, with each new piece being added
cannam@133	155 to the end. If a previously-allocated piece is discarded, it leaves a hole, which wastes space.
cannam@133	156 Since Cap'n Proto lists are flat arrays, the only way to resize a list would be to discard the
cannam@133	157 existing list and allocate a new one, which would thus necessarily waste space.
cannam@133	158
cannam@133	159 In theory, a more complicated memory allocation algorithm could attempt to reuse the "holes" left
cannam@133	160 behind by discarded message pieces. However, it would be hard to make sure any new data inserted
cannam@133	161 into the space is exactly the right size. Fragmentation would result. And the allocator would
cannam@133	162 have to do a lot of extra bookkeeping that could be expensive. This would be sad, as arena
cannam@133	163 allocation is supposed to be cheap!
cannam@133	164
cannam@133	165 The only solution is to temporarily place your data into some other data structure (an
cannam@133	166 `std::vector`, perhaps) until you know how many elements you have, then allocate the list and copy.
cannam@133	167 On the bright side, you probably aren't losing much performance this way -- using vectors already
cannam@133	168 involves making copies every time the backing array grows. It's just annoying to code.
cannam@133	169
cannam@133	170 Keep in mind that you can use [orphans](cxx.html#orphans) to allocate sub-objects before you have
cannam@133	171 a place to put them. But, also note that you cannot allocate elements of a struct list as orphans
cannam@133	172 and then put them together as a list later, because struct lists are encoded as a flat array of
cannam@133	173 struct values, not an array of pointers to struct values. You can, however, allocate any inner
cannam@133	174 objects embedded within those structs as orphans.
cannam@133	175
cannam@133	176 ## Security
cannam@133	177
cannam@133	178 ### Is Cap'n Proto secure?
cannam@133	179
cannam@133	180 What is your threat model?
cannam@133	181
cannam@133	182 ### Sorry. Can Cap'n Proto be used to deserialize malicious messages?
cannam@133	183
cannam@133	184 Cap'n Proto's serialization layer is designed to be safe against malicious input. The Cap'n Proto implementation should never segfault, corrupt memory, leak secrets, execute attacker-specified code, consume excessive resources, etc. as a result of any sequence of input bytes. Moreover, the API is carefully designed to avoid putting app developers into situations where it is easy to write insecure code -- we consider it a bug in Cap'n Proto if apps commonly misuse it in a way that is a security problem.
cannam@133	185
cannam@133	186 With all that said, Cap'n Proto's C++ reference implementation has not yet undergone a formal security review. It may have bugs.
cannam@133	187
cannam@133	188 ### Is it safe to use Cap'n Proto RPC with a malicious peer?
cannam@133	189
cannam@133	190 Cap'n Proto's RPC layer is explicitly designed to be useful for interactions between mutually-distrusting parties. Its capability-based security model makes it easy to express complex interactions securely.
cannam@133	191
cannam@133	192 At this time, the RPC layer is not robust against resource exhaustion attacks, possibly allowing denials of service.
cannam@133	193
cannam@133	194 ### Is Cap'n Proto encrypted?
cannam@133	195
cannam@133	196 Cap'n Proto may be layered on top of an existing encrypted transport, such as TLS, but at this time it is the application's responsibility to add this layer. We plan to integrate this into the Cap'n Proto library proper in the future.
cannam@133	197
cannam@133	198 ### How do I report security bugs?
cannam@133	199
cannam@133	200 Please email [security@sandstorm.io](mailto:security@sandstorm.io).
cannam@133	201
cannam@133	202 ## Sandstorm
cannam@133	203
cannam@133	204 ### How does Cap'n Proto relate to Sandstorm.io?
cannam@133	205
cannam@133	206 [Sandstorm.io](https://sandstorm.io) is an Open Source project and startup founded by Kenton, the author of Cap'n Proto. Cap'n Proto is owned and developed by Sandstorm the company and heavily used in Sandstorm the project.
cannam@133	207
cannam@133	208 ### How does Sandstorm use Cap'n Proto?
cannam@133	209
cannam@133	210 See [this Sandstorm blog post](https://blog.sandstorm.io/news/2014-12-15-capnproto-0.5.html).
cannam@133	211

Mercurial > hg > sv-dependency-builds

annotate src/capnproto-git-20161025/doc/faq.md @ 149:279b18cc7785