Wednesday, January 19, 2011

Monad.Zero -- doing it right

For my own benefit (see title of blog) as much as anything else. When I wrote

Rule 1 of monadic parsers is -- if you return Monad.Zero, you're probably doing it wrong... It's likely that you mean monadic return of an empty list.

it's because there's a distinction between positively returning an empty result (i.e. successfully doing nothing) and returning nothing at all (not succeeding in what you were trying to do) -- as the F# documentation for Zero puts it

Called for empty else branches of if...then expressions in computation expressions.

So, for a parser monad Zero is the function that returns a parser that takes no input and always fails:

for the Enumerable or Seq monad, it's the empty sequence, for Maybe its None and so forth -- basically the only thing you can return if you don't have any value to return.

And for individual leaf parsers -- the functions that define an Operation<TInput, TOutput> and return them wrapped as a Parser<TInput, TOutput>, those Operation<TInput, TOutput>s should return the same non-value when faced with a parse failure.

By contrast, when composing a parser that may try to consume some of the input, but fail, and you want to not fail the whole parse, you need a parser that successfully matches nothing (and consumes nothing) as an alternative that provides a success value -- and that is not Zero, but rather Return of the non-value of the appropriate type. So, for example, the parser combinator Many:

is "Consume as many blocks of p as you can -- or, if that fails, do nothing"; by contrast, the combinator Sat fails if the predicate doesn't match:

So in that case it says Zero and means Zero because that's a failure to satisfy the parser -- here in the equivalent the F# computation expression we don't even have to write the else branch; it's filled in for us by the compiler.

And that's really the one subtlety to be borne in mind -- otherwise you end up with an almost intractable debugging issue when your parser fails unexpectedly, partway through consuming valid input.

Tuesday, January 18, 2011

Monadic Parsers in C#

Updated 17-Jan -- because I realised I'd gotten Input and Output the wrong way around; too easy to do when you're using the same type in each case.

Updated 18-Jan -- In which we bug-fix the heck out of the previous two iterations, and pretty much completely rewrite this post. Rule 1 of monadic parsers is -- if you return Monad.Zero, you're probably doing it wrong... It's likely that you mean monadic return of an empty list.

Following on from Mike Hadlow's recent series of posts on monads in C#, and the "Aha!" moment that SelectMany actually means Bind, in the same way that Select means map, I thought I'd take it for a spin on something more than just the usual Identity, Maybe and State monads.

For a long while now there have been things like illustrative F# ports of a Haskell monadic parser -- so why not, I thought, try to emulate that in C#?

So we have a type which looks like this (abusing KeyValuePair as a 2-tuple for .net 3.5 compatibility) --

which is a monad upon the return type; the input stream parametrization being orthogonal to this.

The monad itself (and here I've reverted to using Bind for the simple case analogous to the F# case) looks like

Here ToParser is the convenience function-to-monad constructor; and Return is what lifts an output value into the monad -- creating a parser that places that output value into the output stream.

The combinators assemble all the parsers from the atomic units -- some here are more specialized than others.

And we can top this off with an example

which produces a successfully parsed output

key=[3; 5; 5; 3]; value=[]
key=[35]; value=[]

So what lessons do I draw from this exercise?

The big one is that old saying about needing to be twice as clever to debug code as to write it; and code that works with continuation passing can get pretty darn clever. Or, as Eric Lippert put it (in the context of writing asynchronous code in this style in C#):

Remember what I was saying about the pros and cons of CPS?

  • PRO: Arbitrarily complex and interesting control flows can be built out of simple parts – check.
  • CON: The reification of control flow via continuations is hard to read and hard to reason about – check.
  • CON: The code that represents the mechanisms of control flow completely overwhelms the meaning of the code – check.
  • CON: The transformation of ordinary code control flow into CPS is the kind of thing that compilers are good at, and almost no one else is – check.

The parts are indeed individually simple enough; but all the rest is sadly true. We can see how clever, what the actual power-to-weight ratio is, or how much what actually gets executed differs from what it superficially looks like we've written, by replacing the initialization in ToParser with

for a static integer value max in the class. Running the test example, we see that the stack depths achieved by these simple parsers are 64 and 95 respectively -- causes and effects are remote from one another (with the side effect that the only code not covered in this belongs to the subtract and divide operator parsers, and the failure handlers in the main program.

Coupled with this is the difficulty of doing anything meaningful in the way of fine-grained TDD in the bootstrap phase -- the individual pieces are not parsers, are comparatively trivial, and even those return functions operating on functions; it's in combination that they do their magic, and there's a lot of code needed just to read one token out of a stream.

Having the F# equivalent to work with -- to provide guidance of what types all the individual pieces were, and then in essence manually de-sugaring the computation expression notation -- was invaluable. But if you have the F# version, there is little need for the transposition except as a five-finger exercise like this.

On the positive side, though, once you have the toolkit like this and it handles even the simple cases, that probably means that it is robust; and then it can pretty much be taken as a black box, for those times when you absolutely have to do it in C# rather than F#.

Sunday, January 16, 2011

More Garden Rubbish

Insanely mild weather today meant that when I slipped out in bare feet and T-shirt to check the greenhouse, I ended up doing a bit more pottering around in the garden to catch up with some of the tidying from the autumn tasks without feeling the need to wear more.

I did put on a shirt and sandals for the catch-up pruning of the apple trees and digging over the beds where the annual flowers and the broccoli, which didn't survive the harsh weather in Nov-Dec, had been.

Tuesday, January 11, 2011

Using F# 2.0 Powerpack ArgParser from C# -- (ii)

Having resolved the issue of why the "--" separator wasn't triggering (PowerShell was swallowing it before it got to my code) that was stumping me in the previous post, I put together a little fluent wrapper to make this a little easier to consume from C#. The code that isolates the F# types looks like this:

The calling code, for the same example as before looks like

which is much clearer.

Now edited for FxCop.

Monday, January 10, 2011

Garden Rubbish and Meaningless Numbers

You can tell quite how early -- late November -- the big freeze set in, as for the last couple of weekends I was doing things like raking leaves off the lawn, as well as other winter chores like pruning roses and clearing other dead foliage from expected dieback (enough to fill the green bin). The cold was also enough that inside the greenhouse, even under a further bubble-wrap tent with a heater set to frost-free, some of the pelargoniums got scorched -- not something I've had happen before in a winterised greenhouse.

In the weekend clearout, I got rid of the last tomato plants from the tent, harvesting one last tomato. Sunday I baked a crumble and almost used up the last of the apples -- leaving two Charles Ross picked and packed remained in a box of fruit that had mostly spontaneously self-destructed; and a large windfall Bramley (we ate the former raw and baked the latter for dessert today). Many of the picked-as-ripe Bramleys also self-destructed from within in the last couple of weeks; the windfalls were just as durable as the cosseted ones.

And some numbers -- at 08:39-08:40 on 21-Oct-10, the digital clock and odometer coincided; as again, cheating at 09:50 BST on 2-Nov; not cheating at 17:43-40 on 6-Jan-11; 1000 miles on 4-Nov. The bike read 1033.9 miles at the end of the year (and also on the 6 month mark of having the bike computer, 7-Jan).

Using F# 2.0 Powerpack ArgParser from C#

This is an update of the technique mentioned in an antique post by Robert Pickering which was referenced by Laurent Le Brun last summer. I've taken Laurent's example of F# usage and ported it to C# in the most direct fashion:

It is possible to streamline this to hide most of the messy type names inside utilities.

One thing I've not managed to do is get the lambda associated with the ArgType.Rest to fire, even after setting the default handler opt1 to FSharpOption<T>.None. That -- the default handler -- gets fired every time an otherwise unmatched argument is encountered; the ArgType.Rest handler looks from the code like it should "just work" so I'm a bit baffled ATM.

LATER: Mystery solved

Of course, I was testing this at a PowerShell prompt -- and the shell was swallowing the unquoted "--" string : there was no "rest" to operate on.