annotate src/capnproto-git-20161025/doc/_posts/2013-08-12-capnproto-0.2-no-more-haskell.md @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 9530b331f8c1
children
rev   line source
cannam@48 1 ---
cannam@48 2 layout: post
cannam@48 3 title: "Cap'n Proto v0.2: Compiler rewritten Haskell -> C++"
cannam@48 4 author: kentonv
cannam@48 5 ---
cannam@48 6
cannam@48 7 Today I am releasing version 0.2 of Cap'n Proto. The most notable change: the compiler / code
cannam@48 8 generator, which was previously written in Haskell, has been rewritten in C++11. There are a few
cannam@48 9 other changes as well, but before I talk about those, let me try to calm the angry mob that is
cannam@48 10 not doubt reaching for their pitchforks as we speak. There are a few reasons for this change,
cannam@48 11 some practical, some ideological. I'll start with the practical.
cannam@48 12
cannam@48 13 **The practical: Supporting dynamic languages**
cannam@48 14
cannam@48 15 Say you are trying to implement Cap'n Proto in an interpreted language like Python. One of the big
cannam@48 16 draws of such a language is that you can edit your code and then run it without an intervening
cannam@48 17 compile step, allowing you to iterate faster. But if the Python Cap'n Proto implementation worked
cannam@48 18 like the C++ one (or like Protobufs), you lose some of that: whenever you change your Cap'n Proto
cannam@48 19 schema files, you must run a command to regenerate the Python code from them. That sucks.
cannam@48 20
cannam@48 21 What you really want to do is parse the schemas at start-up -- the same time that the Python code
cannam@48 22 itself is parsed. But writing a proper schema parser is harder than it looks; you really should
cannam@48 23 reuse the existing implementation. If it is written in Haskell, that's going to be problematic.
cannam@48 24 You either need to invoke the schema parser as a sub-process or you need to call Haskell code from
cannam@48 25 Python via an FFI. Either approach is going to be a huge hack with lots of problems, not the least
cannam@48 26 of which is having a runtime dependency on an entire platform that your end users may not otherwise
cannam@48 27 want.
cannam@48 28
cannam@48 29 But with the schema parser written in C++, things become much simpler. Python code calls into
cannam@48 30 C/C++ all the time. Everyone already has the necessary libraries installed. There's no need to
cannam@48 31 generate code, even; the parsed schema can be fed into the Cap'n Proto C++ runtime's dynamic API,
cannam@48 32 and Python bindings can trivially be implemented on top of that in just a few hundred lines of
cannam@48 33 code. Everyone wins.
cannam@48 34
cannam@48 35 **The ideological: I'm an object-oriented programmer**
cannam@48 36
cannam@48 37 I really wanted to like Haskell. I used to be a strong proponent of functional programming, and
cannam@48 38 I actually once wrote a complete web server and CMS in a purely-functional toy language of my own
cannam@48 39 creation. I love strong static typing, and I find a lot of the constructs in Haskell really
cannam@48 40 powerful and beautiful. Even monads. _Especially_ monads.
cannam@48 41
cannam@48 42 But when it comes down to it, I am an object-oriented programmer, and Haskell is not an
cannam@48 43 object-oriented language. Yes, you can do object-oriented style if you want to, just like you
cannam@48 44 can do objects in C. But it's just too painful. I want to write `object.methodName`, not
cannam@48 45 `ModuleName.objectTypeMethodName object`. I want to be able to write lots of small classes that
cannam@48 46 encapsulate complex functionality in simple interfaces -- _without_ having to place each one in
cannam@48 47 a whole separate module and ending up with thousands of source files. I want to be able to build
cannam@48 48 a list of objects of varying types that implement the same interface without having to re-invent
cannam@48 49 virtual tables every time I do it (type classes don't quite solve the problem).
cannam@48 50
cannam@48 51 And as it turns out, even aside from the lack of object-orientation, I don't actually like
cannam@48 52 functional programming as much as I thought. Yes, writing my parser was super-easy (my first
cannam@48 53 commit message was
cannam@48 54 "[Day 1: Learn Haskell, write a parser](https://github.com/kentonv/capnproto/commit/6bb49ca775501a9b2c7306992fd0de53c5ee4e95)").
cannam@48 55 But everything beyond that seemed to require increasing amounts of brain bending. For instance, to
cannam@48 56 actually encode a Cap'n Proto message, I couldn't just allocate a buffer of zeros and then go
cannam@48 57 through each field and set its value. Instead, I had to compute all the field values first, sort
cannam@48 58 them by position, then concatenate the results.
cannam@48 59
cannam@48 60 Of course, I'm sure it's the case that if I spent years writing Haskell code, I'd eventually become
cannam@48 61 as proficient with it as I am with C++. Perhaps I could un-learn object-oriented style and learn
cannam@48 62 something else that works just as well or better. Basically, though, I decided that this was
cannam@48 63 going to take a lot longer than it at first appeared, and that this wasn't a good use of my
cannam@48 64 limited resources. So, I'm cutting my losses.
cannam@48 65
cannam@48 66 I still think Haskell is a very interesting language, and if works for you, by all means, use it.
cannam@48 67 I would love to see someone write at actual Cap'n Proto runtime implementation in Haskell. But
cannam@48 68 the compiler is now C++.
cannam@48 69
cannam@48 70 **Parser Combinators in C++**
cannam@48 71
cannam@48 72 A side effect (so to speak) of the compiler rewrite is that Cap'n Proto's companion utility
cannam@48 73 library, KJ, now includes a parser combinator framework based on C++11 templates and lambdas.
cannam@48 74 Here's a sample:
cannam@48 75
cannam@48 76 {% highlight c++ %}
cannam@48 77 // Construct a parser that parses a number.
cannam@48 78 auto number = transform(
cannam@48 79 sequence(
cannam@48 80 oneOrMore(charRange('0', '9')),
cannam@48 81 optional(sequence(
cannam@48 82 exactChar<'.'>(),
cannam@48 83 many(charRange('0', '9'))))),
cannam@48 84 [](Array<char> whole, Maybe<Array<char>> maybeFraction)
cannam@48 85 -> Number* {
cannam@48 86 KJ_IF_MAYBE(fraction, maybeFraction) {
cannam@48 87 return new RealNumber(whole, *fraction);
cannam@48 88 } else {
cannam@48 89 return new WholeNumber(whole);
cannam@48 90 }
cannam@48 91 });
cannam@48 92 {% endhighlight %}
cannam@48 93
cannam@48 94 An interesting fact about the above code is that constructing the parser itself does not allocate
cannam@48 95 anything on the heap. The variable `number` in this case ends up being one 96-byte flat object,
cannam@48 96 most of which is composed of tables for character matching. The whole thing could even be
cannam@48 97 declared `constexpr`... if the C++ standard allowed empty-capture lambdas to be `constexpr`, which
cannam@48 98 unfortunately it doesn't (yet).
cannam@48 99
cannam@48 100 Unfortunately, KJ is largely undocumented at the moment, since people who just want to use
cannam@48 101 Cap'n Proto generally don't need to know about it.
cannam@48 102
cannam@48 103 **Other New Features**
cannam@48 104
cannam@48 105 There are a couple other notable changes in this release, aside from the compiler:
cannam@48 106
cannam@48 107 * Cygwin has been added as a supported platform, meaning you can now use Cap'n Proto on Windows.
cannam@48 108 I am considering supporting MinGW as well. Unfortunately, MSVC is unlikely to be supported any
cannam@48 109 time soon as its C++11 support is
cannam@48 110 [woefully lacking](http://blogs.msdn.com/b/somasegar/archive/2013/06/28/cpp-conformance-roadmap.aspx).
cannam@48 111
cannam@48 112 * The new compiler binary -- now called `capnp` rather than `capnpc` -- is more of a multi-tool.
cannam@48 113 It includes the ability to decode binary messages to text as a debugging aid. Type
cannam@48 114 `capnp help decode` for more information.
cannam@48 115
cannam@48 116 * The new [Orphan]({{ site.baseurl }}/cxx.html#orphans) class lets you detach objects from a
cannam@48 117 message tree and re-attach them elsewhere.
cannam@48 118
cannam@48 119 * Various contributors have declared their intentions to implement
cannam@48 120 [Ruby](https://github.com/cstrahan/capnp-ruby),
cannam@48 121 [Rust](https://github.com/dwrensha/capnproto-rust), C#, Java, Erlang, and Delphi bindings. These
cannam@48 122 are still works in progress, but exciting nonetheless!
cannam@48 123
cannam@48 124 **Backwards-compatibility Note**
cannam@48 125
cannam@48 126 Cap'n Proto v0.2 contains an obscure wire format incompatibility with v0.1. If you are using
cannam@48 127 unions containing multiple primitive-type fields of varying sizes, it's possible that the new
cannam@48 128 compiler will position those fields differently. A work-around to get back to the old layout
cannam@48 129 exists; if you believe you could be affected, please [send me](mailto:temporal@gmail.com) your
cannam@48 130 schema and I'll tell you what to do. [Gory details.](https://groups.google.com/d/msg/capnproto/NIYbD0haP38/pH5LildInwIJ)
cannam@48 131
cannam@48 132 **Road Map**
cannam@48 133
cannam@48 134 v0.3 will come in a couple weeks and will include several new features and clean-ups that can now
cannam@48 135 be implemented more easily given the new compiler. This will also hopefully be the first release
cannam@48 136 that officially supports a language other than C++.
cannam@48 137
cannam@48 138 The following release, v0.4, will hopefully be the first release implementing RPC.
cannam@48 139
cannam@48 140 _PS. If you are wondering, compared to the Haskell version, the new compiler is about 50% more
cannam@48 141 lines of code and about 4x faster. The speed increase should be taken with a grain of salt,
cannam@48 142 though, as my Haskell code did all kinds of horribly slow things. The code size is, I think, not
cannam@48 143 bad, considering that Haskell specializes in concision -- but, again, I'm sure a Haskell expert
cannam@48 144 could have written shorter code._