diff src/capnproto-git-20161025/doc/_posts/2013-08-12-capnproto-0.2-no-more-haskell.md @ 133:1ac99bfc383d

Add Cap'n Proto source
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 25 Oct 2016 11:17:01 +0100
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/src/capnproto-git-20161025/doc/_posts/2013-08-12-capnproto-0.2-no-more-haskell.md	Tue Oct 25 11:17:01 2016 +0100
@@ -0,0 +1,144 @@
+---
+layout: post
+title: "Cap'n Proto v0.2: Compiler rewritten Haskell -> C++"
+author: kentonv
+---
+
+Today I am releasing version 0.2 of Cap'n Proto.  The most notable change: the compiler / code
+generator, which was previously written in Haskell, has been rewritten in C++11.  There are a few
+other changes as well, but before I talk about those, let me try to calm the angry mob that is
+not doubt reaching for their pitchforks as we speak.  There are a few reasons for this change,
+some practical, some ideological.  I'll start with the practical.
+
+**The practical:  Supporting dynamic languages**
+
+Say you are trying to implement Cap'n Proto in an interpreted language like Python.  One of the big
+draws of such a language is that you can edit your code and then run it without an intervening
+compile step, allowing you to iterate faster.  But if the Python Cap'n Proto implementation worked
+like the C++ one (or like Protobufs), you lose some of that: whenever you change your Cap'n Proto
+schema files, you must run a command to regenerate the Python code from them.  That sucks.
+
+What you really want to do is parse the schemas at start-up -- the same time that the Python code
+itself is parsed.  But writing a proper schema parser is harder than it looks; you really should
+reuse the existing implementation.  If it is written in Haskell, that's going to be problematic.
+You either need to invoke the schema parser as a sub-process or you need to call Haskell code from
+Python via an FFI.  Either approach is going to be a huge hack with lots of problems, not the least
+of which is having a runtime dependency on an entire platform that your end users may not otherwise
+want.
+
+But with the schema parser written in C++, things become much simpler.  Python code calls into
+C/C++ all the time.  Everyone already has the necessary libraries installed.  There's no need to
+generate code, even; the parsed schema can be fed into the Cap'n Proto C++ runtime's dynamic API,
+and Python bindings can trivially be implemented on top of that in just a few hundred lines of
+code.  Everyone wins.
+
+**The ideological:  I'm an object-oriented programmer**
+
+I really wanted to like Haskell.  I used to be a strong proponent of functional programming, and
+I actually once wrote a complete web server and CMS in a purely-functional toy language of my own
+creation.  I love strong static typing, and I find a lot of the constructs in Haskell really
+powerful and beautiful.  Even monads.  _Especially_ monads.
+
+But when it comes down to it, I am an object-oriented programmer, and Haskell is not an
+object-oriented language.  Yes, you can do object-oriented style if you want to, just like you
+can do objects in C.  But it's just too painful.  I want to write `object.methodName`, not
+`ModuleName.objectTypeMethodName object`.  I want to be able to write lots of small classes that
+encapsulate complex functionality in simple interfaces -- _without_ having to place each one in
+a whole separate module and ending up with thousands of source files.  I want to be able to build
+a list of objects of varying types that implement the same interface without having to re-invent
+virtual tables every time I do it (type classes don't quite solve the problem).
+
+And as it turns out, even aside from the lack of object-orientation, I don't actually like
+functional programming as much as I thought.  Yes, writing my parser was super-easy (my first
+commit message was
+"[Day 1: Learn Haskell, write a parser](https://github.com/kentonv/capnproto/commit/6bb49ca775501a9b2c7306992fd0de53c5ee4e95)").
+But everything beyond that seemed to require increasing amounts of brain bending.  For instance, to
+actually encode a Cap'n Proto message, I couldn't just allocate a buffer of zeros and then go
+through each field and set its value.  Instead, I had to compute all the field values first, sort
+them by position, then concatenate the results.
+
+Of course, I'm sure it's the case that if I spent years writing Haskell code, I'd eventually become
+as proficient with it as I am with C++.  Perhaps I could un-learn object-oriented style and learn
+something else that works just as well or better.  Basically, though, I decided that this was
+going to take a lot longer than it at first appeared, and that this wasn't a good use of my
+limited resources.  So, I'm cutting my losses.
+
+I still think Haskell is a very interesting language, and if works for you, by all means, use it.
+I would love to see someone write at actual Cap'n Proto runtime implementation in Haskell.  But
+the compiler is now C++.
+
+**Parser Combinators in C++**
+
+A side effect (so to speak) of the compiler rewrite is that Cap'n Proto's companion utility
+library, KJ, now includes a parser combinator framework based on C++11 templates and lambdas.
+Here's a sample:
+
+{% highlight c++ %}
+// Construct a parser that parses a number.
+auto number = transform(
+    sequence(
+        oneOrMore(charRange('0', '9')),
+        optional(sequence(
+            exactChar<'.'>(),
+            many(charRange('0', '9'))))),
+    [](Array<char> whole, Maybe<Array<char>> maybeFraction)
+        -> Number* {
+      KJ_IF_MAYBE(fraction, maybeFraction) {
+        return new RealNumber(whole, *fraction);
+      } else {
+        return new WholeNumber(whole);
+      }
+    });
+{% endhighlight %}
+
+An interesting fact about the above code is that constructing the parser itself does not allocate
+anything on the heap.  The variable `number` in this case ends up being one 96-byte flat object,
+most of which is composed of tables for character matching.  The whole thing could even be
+declared `constexpr`...  if the C++ standard allowed empty-capture lambdas to be `constexpr`, which
+unfortunately it doesn't (yet).
+
+Unfortunately, KJ is largely undocumented at the moment, since people who just want to use
+Cap'n Proto generally don't need to know about it.
+
+**Other New Features**
+
+There are a couple other notable changes in this release, aside from the compiler:
+
+* Cygwin has been added as a supported platform, meaning you can now use Cap'n Proto on Windows.
+  I am considering supporting MinGW as well.  Unfortunately, MSVC is unlikely to be supported any
+  time soon as its C++11 support is
+  [woefully lacking](http://blogs.msdn.com/b/somasegar/archive/2013/06/28/cpp-conformance-roadmap.aspx).
+
+* The new compiler binary -- now called `capnp` rather than `capnpc` -- is more of a multi-tool.
+  It includes the ability to decode binary messages to text as a debugging aid.  Type
+  `capnp help decode` for more information.
+
+* The new [Orphan]({{ site.baseurl }}/cxx.html#orphans) class lets you detach objects from a
+  message tree and re-attach them elsewhere.
+
+* Various contributors have declared their intentions to implement
+  [Ruby](https://github.com/cstrahan/capnp-ruby),
+  [Rust](https://github.com/dwrensha/capnproto-rust), C#, Java, Erlang, and Delphi bindings.  These
+  are still works in progress, but exciting nonetheless!
+
+**Backwards-compatibility Note**
+
+Cap'n Proto v0.2 contains an obscure wire format incompatibility with v0.1.  If you are using
+unions containing multiple primitive-type fields of varying sizes, it's possible that the new
+compiler will position those fields differently.  A work-around to get back to the old layout
+exists; if you believe you could be affected, please [send me](mailto:temporal@gmail.com) your
+schema and I'll tell you what to do.  [Gory details.](https://groups.google.com/d/msg/capnproto/NIYbD0haP38/pH5LildInwIJ)
+
+**Road Map**
+
+v0.3 will come in a couple weeks and will include several new features and clean-ups that can now
+be implemented more easily given the new compiler.  This will also hopefully be the first release
+that officially supports a language other than C++.
+
+The following release, v0.4, will hopefully be the first release implementing RPC.
+
+_PS.  If you are wondering, compared to the Haskell version, the new compiler is about 50% more
+lines of code and about 4x faster.  The speed increase should be taken with a grain of salt,
+though, as my Haskell code did all kinds of horribly slow things.  The code size is, I think, not
+bad, considering that Haskell specializes in concision -- but, again, I'm sure a Haskell expert
+could have written shorter code._