cannam@147: --- cannam@147: layout: post cannam@147: title: "Cap'n Proto v0.2: Compiler rewritten Haskell -> C++" cannam@147: author: kentonv cannam@147: --- cannam@147: cannam@147: Today I am releasing version 0.2 of Cap'n Proto. The most notable change: the compiler / code cannam@147: generator, which was previously written in Haskell, has been rewritten in C++11. There are a few cannam@147: other changes as well, but before I talk about those, let me try to calm the angry mob that is cannam@147: not doubt reaching for their pitchforks as we speak. There are a few reasons for this change, cannam@147: some practical, some ideological. I'll start with the practical. cannam@147: cannam@147: **The practical: Supporting dynamic languages** cannam@147: cannam@147: Say you are trying to implement Cap'n Proto in an interpreted language like Python. One of the big cannam@147: draws of such a language is that you can edit your code and then run it without an intervening cannam@147: compile step, allowing you to iterate faster. But if the Python Cap'n Proto implementation worked cannam@147: like the C++ one (or like Protobufs), you lose some of that: whenever you change your Cap'n Proto cannam@147: schema files, you must run a command to regenerate the Python code from them. That sucks. cannam@147: cannam@147: What you really want to do is parse the schemas at start-up -- the same time that the Python code cannam@147: itself is parsed. But writing a proper schema parser is harder than it looks; you really should cannam@147: reuse the existing implementation. If it is written in Haskell, that's going to be problematic. cannam@147: You either need to invoke the schema parser as a sub-process or you need to call Haskell code from cannam@147: Python via an FFI. Either approach is going to be a huge hack with lots of problems, not the least cannam@147: of which is having a runtime dependency on an entire platform that your end users may not otherwise cannam@147: want. cannam@147: cannam@147: But with the schema parser written in C++, things become much simpler. Python code calls into cannam@147: C/C++ all the time. Everyone already has the necessary libraries installed. There's no need to cannam@147: generate code, even; the parsed schema can be fed into the Cap'n Proto C++ runtime's dynamic API, cannam@147: and Python bindings can trivially be implemented on top of that in just a few hundred lines of cannam@147: code. Everyone wins. cannam@147: cannam@147: **The ideological: I'm an object-oriented programmer** cannam@147: cannam@147: I really wanted to like Haskell. I used to be a strong proponent of functional programming, and cannam@147: I actually once wrote a complete web server and CMS in a purely-functional toy language of my own cannam@147: creation. I love strong static typing, and I find a lot of the constructs in Haskell really cannam@147: powerful and beautiful. Even monads. _Especially_ monads. cannam@147: cannam@147: But when it comes down to it, I am an object-oriented programmer, and Haskell is not an cannam@147: object-oriented language. Yes, you can do object-oriented style if you want to, just like you cannam@147: can do objects in C. But it's just too painful. I want to write `object.methodName`, not cannam@147: `ModuleName.objectTypeMethodName object`. I want to be able to write lots of small classes that cannam@147: encapsulate complex functionality in simple interfaces -- _without_ having to place each one in cannam@147: a whole separate module and ending up with thousands of source files. I want to be able to build cannam@147: a list of objects of varying types that implement the same interface without having to re-invent cannam@147: virtual tables every time I do it (type classes don't quite solve the problem). cannam@147: cannam@147: And as it turns out, even aside from the lack of object-orientation, I don't actually like cannam@147: functional programming as much as I thought. Yes, writing my parser was super-easy (my first cannam@147: commit message was cannam@147: "[Day 1: Learn Haskell, write a parser](https://github.com/kentonv/capnproto/commit/6bb49ca775501a9b2c7306992fd0de53c5ee4e95)"). cannam@147: But everything beyond that seemed to require increasing amounts of brain bending. For instance, to cannam@147: actually encode a Cap'n Proto message, I couldn't just allocate a buffer of zeros and then go cannam@147: through each field and set its value. Instead, I had to compute all the field values first, sort cannam@147: them by position, then concatenate the results. cannam@147: cannam@147: Of course, I'm sure it's the case that if I spent years writing Haskell code, I'd eventually become cannam@147: as proficient with it as I am with C++. Perhaps I could un-learn object-oriented style and learn cannam@147: something else that works just as well or better. Basically, though, I decided that this was cannam@147: going to take a lot longer than it at first appeared, and that this wasn't a good use of my cannam@147: limited resources. So, I'm cutting my losses. cannam@147: cannam@147: I still think Haskell is a very interesting language, and if works for you, by all means, use it. cannam@147: I would love to see someone write at actual Cap'n Proto runtime implementation in Haskell. But cannam@147: the compiler is now C++. cannam@147: cannam@147: **Parser Combinators in C++** cannam@147: cannam@147: A side effect (so to speak) of the compiler rewrite is that Cap'n Proto's companion utility cannam@147: library, KJ, now includes a parser combinator framework based on C++11 templates and lambdas. cannam@147: Here's a sample: cannam@147: cannam@147: {% highlight c++ %} cannam@147: // Construct a parser that parses a number. cannam@147: auto number = transform( cannam@147: sequence( cannam@147: oneOrMore(charRange('0', '9')), cannam@147: optional(sequence( cannam@147: exactChar<'.'>(), cannam@147: many(charRange('0', '9'))))), cannam@147: [](Array whole, Maybe> maybeFraction) cannam@147: -> Number* { cannam@147: KJ_IF_MAYBE(fraction, maybeFraction) { cannam@147: return new RealNumber(whole, *fraction); cannam@147: } else { cannam@147: return new WholeNumber(whole); cannam@147: } cannam@147: }); cannam@147: {% endhighlight %} cannam@147: cannam@147: An interesting fact about the above code is that constructing the parser itself does not allocate cannam@147: anything on the heap. The variable `number` in this case ends up being one 96-byte flat object, cannam@147: most of which is composed of tables for character matching. The whole thing could even be cannam@147: declared `constexpr`... if the C++ standard allowed empty-capture lambdas to be `constexpr`, which cannam@147: unfortunately it doesn't (yet). cannam@147: cannam@147: Unfortunately, KJ is largely undocumented at the moment, since people who just want to use cannam@147: Cap'n Proto generally don't need to know about it. cannam@147: cannam@147: **Other New Features** cannam@147: cannam@147: There are a couple other notable changes in this release, aside from the compiler: cannam@147: cannam@147: * Cygwin has been added as a supported platform, meaning you can now use Cap'n Proto on Windows. cannam@147: I am considering supporting MinGW as well. Unfortunately, MSVC is unlikely to be supported any cannam@147: time soon as its C++11 support is cannam@147: [woefully lacking](http://blogs.msdn.com/b/somasegar/archive/2013/06/28/cpp-conformance-roadmap.aspx). cannam@147: cannam@147: * The new compiler binary -- now called `capnp` rather than `capnpc` -- is more of a multi-tool. cannam@147: It includes the ability to decode binary messages to text as a debugging aid. Type cannam@147: `capnp help decode` for more information. cannam@147: cannam@147: * The new [Orphan]({{ site.baseurl }}/cxx.html#orphans) class lets you detach objects from a cannam@147: message tree and re-attach them elsewhere. cannam@147: cannam@147: * Various contributors have declared their intentions to implement cannam@147: [Ruby](https://github.com/cstrahan/capnp-ruby), cannam@147: [Rust](https://github.com/dwrensha/capnproto-rust), C#, Java, Erlang, and Delphi bindings. These cannam@147: are still works in progress, but exciting nonetheless! cannam@147: cannam@147: **Backwards-compatibility Note** cannam@147: cannam@147: Cap'n Proto v0.2 contains an obscure wire format incompatibility with v0.1. If you are using cannam@147: unions containing multiple primitive-type fields of varying sizes, it's possible that the new cannam@147: compiler will position those fields differently. A work-around to get back to the old layout cannam@147: exists; if you believe you could be affected, please [send me](mailto:temporal@gmail.com) your cannam@147: schema and I'll tell you what to do. [Gory details.](https://groups.google.com/d/msg/capnproto/NIYbD0haP38/pH5LildInwIJ) cannam@147: cannam@147: **Road Map** cannam@147: cannam@147: v0.3 will come in a couple weeks and will include several new features and clean-ups that can now cannam@147: be implemented more easily given the new compiler. This will also hopefully be the first release cannam@147: that officially supports a language other than C++. cannam@147: cannam@147: The following release, v0.4, will hopefully be the first release implementing RPC. cannam@147: cannam@147: _PS. If you are wondering, compared to the Haskell version, the new compiler is about 50% more cannam@147: lines of code and about 4x faster. The speed increase should be taken with a grain of salt, cannam@147: though, as my Haskell code did all kinds of horribly slow things. The code size is, I think, not cannam@147: bad, considering that Haskell specializes in concision -- but, again, I'm sure a Haskell expert cannam@147: could have written shorter code._