cannam@133
|
1 ---
|
cannam@133
|
2 layout: post
|
cannam@133
|
3 title: "Cap'n Proto v0.2: Compiler rewritten Haskell -> C++"
|
cannam@133
|
4 author: kentonv
|
cannam@133
|
5 ---
|
cannam@133
|
6
|
cannam@133
|
7 Today I am releasing version 0.2 of Cap'n Proto. The most notable change: the compiler / code
|
cannam@133
|
8 generator, which was previously written in Haskell, has been rewritten in C++11. There are a few
|
cannam@133
|
9 other changes as well, but before I talk about those, let me try to calm the angry mob that is
|
cannam@133
|
10 not doubt reaching for their pitchforks as we speak. There are a few reasons for this change,
|
cannam@133
|
11 some practical, some ideological. I'll start with the practical.
|
cannam@133
|
12
|
cannam@133
|
13 **The practical: Supporting dynamic languages**
|
cannam@133
|
14
|
cannam@133
|
15 Say you are trying to implement Cap'n Proto in an interpreted language like Python. One of the big
|
cannam@133
|
16 draws of such a language is that you can edit your code and then run it without an intervening
|
cannam@133
|
17 compile step, allowing you to iterate faster. But if the Python Cap'n Proto implementation worked
|
cannam@133
|
18 like the C++ one (or like Protobufs), you lose some of that: whenever you change your Cap'n Proto
|
cannam@133
|
19 schema files, you must run a command to regenerate the Python code from them. That sucks.
|
cannam@133
|
20
|
cannam@133
|
21 What you really want to do is parse the schemas at start-up -- the same time that the Python code
|
cannam@133
|
22 itself is parsed. But writing a proper schema parser is harder than it looks; you really should
|
cannam@133
|
23 reuse the existing implementation. If it is written in Haskell, that's going to be problematic.
|
cannam@133
|
24 You either need to invoke the schema parser as a sub-process or you need to call Haskell code from
|
cannam@133
|
25 Python via an FFI. Either approach is going to be a huge hack with lots of problems, not the least
|
cannam@133
|
26 of which is having a runtime dependency on an entire platform that your end users may not otherwise
|
cannam@133
|
27 want.
|
cannam@133
|
28
|
cannam@133
|
29 But with the schema parser written in C++, things become much simpler. Python code calls into
|
cannam@133
|
30 C/C++ all the time. Everyone already has the necessary libraries installed. There's no need to
|
cannam@133
|
31 generate code, even; the parsed schema can be fed into the Cap'n Proto C++ runtime's dynamic API,
|
cannam@133
|
32 and Python bindings can trivially be implemented on top of that in just a few hundred lines of
|
cannam@133
|
33 code. Everyone wins.
|
cannam@133
|
34
|
cannam@133
|
35 **The ideological: I'm an object-oriented programmer**
|
cannam@133
|
36
|
cannam@133
|
37 I really wanted to like Haskell. I used to be a strong proponent of functional programming, and
|
cannam@133
|
38 I actually once wrote a complete web server and CMS in a purely-functional toy language of my own
|
cannam@133
|
39 creation. I love strong static typing, and I find a lot of the constructs in Haskell really
|
cannam@133
|
40 powerful and beautiful. Even monads. _Especially_ monads.
|
cannam@133
|
41
|
cannam@133
|
42 But when it comes down to it, I am an object-oriented programmer, and Haskell is not an
|
cannam@133
|
43 object-oriented language. Yes, you can do object-oriented style if you want to, just like you
|
cannam@133
|
44 can do objects in C. But it's just too painful. I want to write `object.methodName`, not
|
cannam@133
|
45 `ModuleName.objectTypeMethodName object`. I want to be able to write lots of small classes that
|
cannam@133
|
46 encapsulate complex functionality in simple interfaces -- _without_ having to place each one in
|
cannam@133
|
47 a whole separate module and ending up with thousands of source files. I want to be able to build
|
cannam@133
|
48 a list of objects of varying types that implement the same interface without having to re-invent
|
cannam@133
|
49 virtual tables every time I do it (type classes don't quite solve the problem).
|
cannam@133
|
50
|
cannam@133
|
51 And as it turns out, even aside from the lack of object-orientation, I don't actually like
|
cannam@133
|
52 functional programming as much as I thought. Yes, writing my parser was super-easy (my first
|
cannam@133
|
53 commit message was
|
cannam@133
|
54 "[Day 1: Learn Haskell, write a parser](https://github.com/kentonv/capnproto/commit/6bb49ca775501a9b2c7306992fd0de53c5ee4e95)").
|
cannam@133
|
55 But everything beyond that seemed to require increasing amounts of brain bending. For instance, to
|
cannam@133
|
56 actually encode a Cap'n Proto message, I couldn't just allocate a buffer of zeros and then go
|
cannam@133
|
57 through each field and set its value. Instead, I had to compute all the field values first, sort
|
cannam@133
|
58 them by position, then concatenate the results.
|
cannam@133
|
59
|
cannam@133
|
60 Of course, I'm sure it's the case that if I spent years writing Haskell code, I'd eventually become
|
cannam@133
|
61 as proficient with it as I am with C++. Perhaps I could un-learn object-oriented style and learn
|
cannam@133
|
62 something else that works just as well or better. Basically, though, I decided that this was
|
cannam@133
|
63 going to take a lot longer than it at first appeared, and that this wasn't a good use of my
|
cannam@133
|
64 limited resources. So, I'm cutting my losses.
|
cannam@133
|
65
|
cannam@133
|
66 I still think Haskell is a very interesting language, and if works for you, by all means, use it.
|
cannam@133
|
67 I would love to see someone write at actual Cap'n Proto runtime implementation in Haskell. But
|
cannam@133
|
68 the compiler is now C++.
|
cannam@133
|
69
|
cannam@133
|
70 **Parser Combinators in C++**
|
cannam@133
|
71
|
cannam@133
|
72 A side effect (so to speak) of the compiler rewrite is that Cap'n Proto's companion utility
|
cannam@133
|
73 library, KJ, now includes a parser combinator framework based on C++11 templates and lambdas.
|
cannam@133
|
74 Here's a sample:
|
cannam@133
|
75
|
cannam@133
|
76 {% highlight c++ %}
|
cannam@133
|
77 // Construct a parser that parses a number.
|
cannam@133
|
78 auto number = transform(
|
cannam@133
|
79 sequence(
|
cannam@133
|
80 oneOrMore(charRange('0', '9')),
|
cannam@133
|
81 optional(sequence(
|
cannam@133
|
82 exactChar<'.'>(),
|
cannam@133
|
83 many(charRange('0', '9'))))),
|
cannam@133
|
84 [](Array<char> whole, Maybe<Array<char>> maybeFraction)
|
cannam@133
|
85 -> Number* {
|
cannam@133
|
86 KJ_IF_MAYBE(fraction, maybeFraction) {
|
cannam@133
|
87 return new RealNumber(whole, *fraction);
|
cannam@133
|
88 } else {
|
cannam@133
|
89 return new WholeNumber(whole);
|
cannam@133
|
90 }
|
cannam@133
|
91 });
|
cannam@133
|
92 {% endhighlight %}
|
cannam@133
|
93
|
cannam@133
|
94 An interesting fact about the above code is that constructing the parser itself does not allocate
|
cannam@133
|
95 anything on the heap. The variable `number` in this case ends up being one 96-byte flat object,
|
cannam@133
|
96 most of which is composed of tables for character matching. The whole thing could even be
|
cannam@133
|
97 declared `constexpr`... if the C++ standard allowed empty-capture lambdas to be `constexpr`, which
|
cannam@133
|
98 unfortunately it doesn't (yet).
|
cannam@133
|
99
|
cannam@133
|
100 Unfortunately, KJ is largely undocumented at the moment, since people who just want to use
|
cannam@133
|
101 Cap'n Proto generally don't need to know about it.
|
cannam@133
|
102
|
cannam@133
|
103 **Other New Features**
|
cannam@133
|
104
|
cannam@133
|
105 There are a couple other notable changes in this release, aside from the compiler:
|
cannam@133
|
106
|
cannam@133
|
107 * Cygwin has been added as a supported platform, meaning you can now use Cap'n Proto on Windows.
|
cannam@133
|
108 I am considering supporting MinGW as well. Unfortunately, MSVC is unlikely to be supported any
|
cannam@133
|
109 time soon as its C++11 support is
|
cannam@133
|
110 [woefully lacking](http://blogs.msdn.com/b/somasegar/archive/2013/06/28/cpp-conformance-roadmap.aspx).
|
cannam@133
|
111
|
cannam@133
|
112 * The new compiler binary -- now called `capnp` rather than `capnpc` -- is more of a multi-tool.
|
cannam@133
|
113 It includes the ability to decode binary messages to text as a debugging aid. Type
|
cannam@133
|
114 `capnp help decode` for more information.
|
cannam@133
|
115
|
cannam@133
|
116 * The new [Orphan]({{ site.baseurl }}/cxx.html#orphans) class lets you detach objects from a
|
cannam@133
|
117 message tree and re-attach them elsewhere.
|
cannam@133
|
118
|
cannam@133
|
119 * Various contributors have declared their intentions to implement
|
cannam@133
|
120 [Ruby](https://github.com/cstrahan/capnp-ruby),
|
cannam@133
|
121 [Rust](https://github.com/dwrensha/capnproto-rust), C#, Java, Erlang, and Delphi bindings. These
|
cannam@133
|
122 are still works in progress, but exciting nonetheless!
|
cannam@133
|
123
|
cannam@133
|
124 **Backwards-compatibility Note**
|
cannam@133
|
125
|
cannam@133
|
126 Cap'n Proto v0.2 contains an obscure wire format incompatibility with v0.1. If you are using
|
cannam@133
|
127 unions containing multiple primitive-type fields of varying sizes, it's possible that the new
|
cannam@133
|
128 compiler will position those fields differently. A work-around to get back to the old layout
|
cannam@133
|
129 exists; if you believe you could be affected, please [send me](mailto:temporal@gmail.com) your
|
cannam@133
|
130 schema and I'll tell you what to do. [Gory details.](https://groups.google.com/d/msg/capnproto/NIYbD0haP38/pH5LildInwIJ)
|
cannam@133
|
131
|
cannam@133
|
132 **Road Map**
|
cannam@133
|
133
|
cannam@133
|
134 v0.3 will come in a couple weeks and will include several new features and clean-ups that can now
|
cannam@133
|
135 be implemented more easily given the new compiler. This will also hopefully be the first release
|
cannam@133
|
136 that officially supports a language other than C++.
|
cannam@133
|
137
|
cannam@133
|
138 The following release, v0.4, will hopefully be the first release implementing RPC.
|
cannam@133
|
139
|
cannam@133
|
140 _PS. If you are wondering, compared to the Haskell version, the new compiler is about 50% more
|
cannam@133
|
141 lines of code and about 4x faster. The speed increase should be taken with a grain of salt,
|
cannam@133
|
142 though, as my Haskell code did all kinds of horribly slow things. The code size is, I think, not
|
cannam@133
|
143 bad, considering that Haskell specializes in concision -- but, again, I'm sure a Haskell expert
|
cannam@133
|
144 could have written shorter code._
|