Distributed Memory: Monadic Parsers in C#

Tuesday, January 18, 2011

Monadic Parsers in C#

Updated 17-Jan -- because I realised I'd gotten Input and Output the wrong way around; too easy to do when you're using the same type in each case.

Updated 18-Jan -- In which we bug-fix the heck out of the previous two iterations, and pretty much completely rewrite this post. Rule 1 of monadic parsers is -- if you return Monad.Zero, you're probably doing it wrong... It's likely that you mean monadic return of an empty list.

Following on from Mike Hadlow's recent series of posts on monads in C#, and the "Aha!" moment that SelectMany actually means Bind, in the same way that Select means map, I thought I'd take it for a spin on something more than just the usual Identity, Maybe and State monads.

For a long while now there have been things like illustrative F# ports of a Haskell monadic parser -- so why not, I thought, try to emulate that in C#?

So we have a type which looks like this (abusing KeyValuePair as a 2-tuple for .net 3.5 compatibility) --

which is a monad upon the return type; the input stream parametrization being orthogonal to this.

The monad itself (and here I've reverted to using Bind for the simple case analogous to the F# case) looks like

Here ToParser is the convenience function-to-monad constructor; and Return is what lifts an output value into the monad -- creating a parser that places that output value into the output stream.

The combinators assemble all the parsers from the atomic units -- some here are more specialized than others.

And we can top this off with an example

which produces a successfully parsed output

key=[3; 5; 5; 3]; value=[]
key=[35]; value=[]

So what lessons do I draw from this exercise?

The big one is that old saying about needing to be twice as clever to debug code as to write it; and code that works with continuation passing can get pretty darn clever. Or, as Eric Lippert put it (in the context of writing asynchronous code in this style in C#):

Remember what I was saying about the pros and cons of CPS?
PRO: Arbitrarily complex and interesting control flows can be built out of simple parts – check.

CON: The reification of control flow via continuations is hard to read and hard to reason about – check.

CON: The code that represents the mechanisms of control flow completely overwhelms the meaning of the code – check.

CON: The transformation of ordinary code control flow into CPS is the kind of thing that compilers are good at, and almost no one else is – check.

The parts are indeed individually simple enough; but all the rest is sadly true. We can see how clever, what the actual power-to-weight ratio is, or how much what actually gets executed differs from what it superficially looks like we've written, by replacing the initialization in ToParser with

for a static integer value max in the class. Running the test example, we see that the stack depths achieved by these simple parsers are 64 and 95 respectively -- causes and effects are remote from one another (with the side effect that the only code not covered in this belongs to the subtract and divide operator parsers, and the failure handlers in the main program.

Coupled with this is the difficulty of doing anything meaningful in the way of fine-grained TDD in the bootstrap phase -- the individual pieces are not parsers, are comparatively trivial, and even those return functions operating on functions; it's in combination that they do their magic, and there's a lot of code needed just to read one token out of a stream.

Having the F# equivalent to work with -- to provide guidance of what types all the individual pieces were, and then in essence manually de-sugaring the computation expression notation -- was invaluable. But if you have the F# version, there is little need for the transposition except as a five-finger exercise like this.

On the positive side, though, once you have the toolkit like this and it handles even the simple cases, that probably means that it is robust; and then it can pretty much be taken as a black box, for those times when you absolutely have to do it in C# rather than F#.