Tuesday, December 31, 2013

8600.5

Flooded again

Standing water

While we have had the occasional day of heavy rain, with a little bit of waterlogging -- though nothing quite as bad as last year,

Pines on the horizon

Bright Winter's Day

thanks to the continuing fair weather -- including a lovely few days at the start of the holidays -- I more than made all the mileage goals I'd set myself. On my bike that was 3008 miles, plus 433 on hired bikes for a total of 3441 miles; not quite the equivalent of commuting to work every other day.

Maybe I should try for the 10 miles a day average next year.



Monday, December 30, 2013

An introduction to Functional Programming in F# -- Part 6: Continuations and Monads

Hang onto your hats! This final part is where the ride gets wild.


Topics covered in the series

Alas, despite having trailed generative programming, I find I don't have any notes from the seminar series covering that topic explicitly (it was not one I was presenting). If I find something, I'll add it in as an extra post.


Continuations

This style of programming originally arose in handling data streams in a fashion using lazy evaluation, and uses closures to represent each step of the program. I/O is made available as a pair of lazy lists, a request list and a response list, and the program is a function which takes a response list to a request list:

type Request = Getq | Putq of char
type Response = Getp of char | Putp
type Dialog = Response list -> Request list
view raw gistfile1.fs hosted with ❤ by GitHub

For example:

let readChar f p =
Getq ::
(match p with
| (Getp c) :: p1 -> f c p1
| _ -> []);;
view raw gistfile1.fs hosted with ❤ by GitHub

The hidden causality here is that putting the Getq at the head of the request queue causes the Getp c to exist at the head of the response queue (perhaps by reading the keyboard).


In general, the continuation style uses chained closures to construct a complex compound operation from a collection of small basic operations. We introduce the types

type continuation = K of (unit -> unit)
let execute (K f) = f()
let stop = K (fun () -> ());;
view raw gistfile1.fs hosted with ❤ by GitHub

Our building blocks will mostly be producers and consumers, and others that are neither:

Producer: (X -> continuation) -> continuation
Consumer: X -> continuation -> continuation // strictly a continuation generator, as it produces a continuation from the output of the producer.
Operation: continuation -> continuation
view raw gistfile1.fs hosted with ❤ by GitHub
  • Where possible, the continuations are built during the construction phase.
  • Each step is provided with the rest of the execution as a continuation building block(s).
  • A producer is provided with a consumer which constructs a continuation from the produced data at execution time.

Example -- Console I/O

First we define some primitives:

let putc (c: char) (k: continuation) =
let write () = System.Console.Write(c);
execute k
K write
let getc (g: char -> continuation) =
let read () = execute (g <| System.Console.ReadKey(true).KeyChar)
K read;;
view raw gistfile1.fs hosted with ❤ by GitHub

where we introduce the standard convenience function

let (<|) f x = (f x);;
view raw gistfile1.fs hosted with ❤ by GitHub

to avoid extra parentheses.

We can then define a compound operation, for example:

let rec echo k =
let echoChar c = if c = '\r'
then putc '\n' k
else putc c (echo k)
getc echoChar;;
view raw gistfile1.fs hosted with ❤ by GitHub

Example -- The Lexer revisited

The type constructor lexerResult describes the values that the lexer is going to produce. Note that the lexer is going to be a producer continuation.

// Define a result type of a lexeme list or error message:
type lexerResult<'lexeme> =
Success of 'lexeme list
| Failure of string;;
view raw gistfile1.fs hosted with ❤ by GitHub

The type constructor lexerSystem provides us with an I/O abstraction.

// Define a type with a character source, a 'lexeme sink and an error string sink:
type lexerSystem<'lexeme> = {
input: unit -> char option;
output: 'lexeme -> unit;
error: string -> unit;
result: unit -> lexerResult<'lexeme> };;
view raw gistfile1.fs hosted with ❤ by GitHub

Given that the lexer is going to be a producer, we will need a consumer to complete its construction.

// Read the output result and continue with (g result)
let lexComplete s g =
let complete () =
g (s.result ())
K complete;;
view raw gistfile1.fs hosted with ❤ by GitHub

To allow for lexical analysis failures, we also have a building block for that.

// Set the output to the failure state and continue with k
let lexFail s e k =
let storeError () =
s.error e
execute k
K storeError;;
view raw gistfile1.fs hosted with ❤ by GitHub

Note that although it looks like a consumer, in practice we always know the message to use at construction time.

Function lexGetChar builds our most primitive producer continuation.
The consumer (g) constructs a continuation from the character option read from the input.

// s is our lexerSystem
let lexGetChar s g =
K (fun () -> execute (g (s.input())));;
view raw gistfile1.fs hosted with ❤ by GitHub

Similarly, lexPutLexeme builds our most primitive consumer, which writes a lexeme to the output.

let lexPutLexeme s lexeme (k: continuation) =
let storeLexeme () =
s.output lexeme;
execute k
K storeLexeme;;
view raw gistfile1.fs hosted with ❤ by GitHub

In order to make decisions based on our inputs we need to express a conditional branch.

// (^^) associates on the right, which means that we can chain instances without requiring nested brackets
let ( ^^ ) (test, g1) (g2: 'x -> continuation) x =
(if test x then g1 else g2) x;;
view raw gistfile1.fs hosted with ❤ by GitHub

and similarly we want to be able to express repetition.

let rec ( ^* ) (generator: ('x option -> continuation) -> continuation, test)
(consumer: 'x list -> 'x option -> continuation) =
generator (function Some x -> if test x
then (generator, test) ^* (fun l -> consumer(x::l))
else consumer [] (Some x)
| None -> consumer [] None);;
view raw gistfile1.fs hosted with ❤ by GitHub

Thus equipped we can start to write the lexer.

type LispLex = Bra | Ket | Dot | Str of string | Num of int | Symbol of string
let alpha c = ((c >= 'a') && (c <= 'z')) || ((c >= 'A') && (c <= 'Z'))
let digit c = (c >= '0') && (c <= '9')
let eqChar c c' = c = c'
let l2s l = String(List.toArray l)
let ( ^| ) p q = fun x -> (p x) || (q x)
// Read a symbol from the input
let readSymbol s char g =
(lexGetChar s, alpha ^| digit ^| eqChar '_') ^*
(fun l ->
fun opt ->
lexPutLexeme s (Symbol (l2s ([char]@l))) (g opt));;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: Complete the lexer.


Summary

We have used closures as continuations to construct functions which compose using function application to create more complex functions. The end result is a complex closure, which is itself just a function which, when applied to the unit value “()”, is the program we want, with any side effects of the constituent closures. The main problem that we, as programmers, are likely to feel with this approach is that the process of construction is counter-intuitive, due to the back to front construction style, given that each step of the closure takes an argument representing the rest of the program, thus

type continuation = K of (unit -> unit)
step1: continuation -> continuation
end: continuation
program = step1 end : continuation
view raw gistfile1.fs hosted with ❤ by GitHub

As Eric Lippert put it (in the context of writing asynchronous code in this style in C#):

Remember what I was saying about the pros and cons of CPS?

  • PRO: Arbitrarily complex and interesting control flows can be built out of simple parts – check.
  • CON: The reification of control flow via continuations is hard to read and hard to reason about – check.
  • CON: The code that represents the mechanisms of control flow completely overwhelms the meaning of the code – check.
  • CON: The transformation of ordinary code control flow into CPS is the kind of thing that compilers are good at, and almost no one else is – check.

It is important to note, however, that this is not the only use of continuations. We regularly use continuations as event handlers, and they can also be used to search a set of alternatives, by effectively constructing a set of programs to be tried one after another until one does not fail provides a result, or all of them fail.

Used appropriately, this is a powerful technique.



Monads

Despite the scary name, monads are constructs that arise quite naturally in functional programming; typically as a result of trying to express the OO notion of a separation of concerns, here between individual functions doing localized transformations, and the larger control flow of the program.

A monad is a type the values of which encapsulate a piece of a program in much the same way as a continuation does. There are, however, differences in the way the values work and the way they compose, which fits into the functional programming style more naturally. We will, for the moment, write a monad as a type M t where t is a type which values of the monad produce when “run”, except when the run fails (in which case no value is produced). As with continuation programming, a program is constructed by the composition of monads and functions which produce monads. The basic constructions used are:

Return: t -> M t
Fail: M t
Bind: (t1 -> M t2) -> M t1 -> M t2
view raw gistfile1.fs hosted with ❤ by GitHub

Understanding “bind”

The “bind” combinator is often written as the infix operator >>=. This begins to make sense if we use a program layout thus:

m >>= fun value1 ->
E1[value1] >>= fun value2 ->
E2[value2]
view raw gistfile1.fs hosted with ❤ by GitHub

We read this as binding value1 to the output(s) of m, using this to produce the monad E1[value1], and then binding value2 to the outputs of that, constructing the monad E2[value2], and so on. A language with monads built in can even provide syntactic sugar for this, such as:

value1 := m;
value2 := E1[value1]
E2[value2]
view raw gistfile1.fs hosted with ❤ by GitHub

Unfortunately the F# type system is not expressive enough for us to write a general >>= along the lines of

let __binder__<'T, 'U, 'M when 'M : (static member bind : ('T -> 'U 'M ) -> 'T 'M -> 'U 'M)> (x: 'T 'M) (f : 'T -> 'U 'M) =
x |> 'M.bind f;;
let (>>=) = __binder__;;
view raw gistfile1.fs hosted with ❤ by GitHub

so for the moment we will write >>= but mean the comparable operator specialised to the monad of interest.

Derived combinator cases

A monad type M unit is special (think of it as pure side-effect). With such a type there little point in binding the result values – we know that we will only get ():

let (>>) m1 m2 = m1 >>= fun () -> m2;;
view raw gistfile1.fs hosted with ❤ by GitHub

A monad which produces a function or functions is also interesting:

let Apply m1 m2
= m1 >>= fun f ->
m2 >>= fun x ->
Return (f x);;
view raw gistfile1.fs hosted with ❤ by GitHub

Given a function we can construct a function over monads:

let Lift f m =
m >>= fun x ->
Return (f x);;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: Prove that Lift f = Apply (Return f)

Given two functions that return monads, we can combine them:

let (>=>) f g
= fun x -> (f x) >>= g;;
view raw gistfile1.fs hosted with ❤ by GitHub

A monad which produces a monad or monads can be flattened (lose a level of monad):

let Join mm
= mm >>= fun x -> x;;
view raw gistfile1.fs hosted with ❤ by GitHub

The name join makes sense when the monad involved produces multiple values – the result monad produces all of the values produced by all of the monads produced by mm.

Given a list of monads we can construct a monad of lists:

let rec Sequence l
= match l with
| [] -> Return []
| h::t -> h >>= fun h1 -> (Sequence t) >>= fun t1 -> Return (h1::t1);;
view raw gistfile1.fs hosted with ❤ by GitHub

An example of a monad

We introduced the notion of a monad with the description that it handles things like separation of concerns. One very familiar language construct that we have been using from the very first session can be written in monadic form; a 'a list is just another monad 'a:

let ReturnList x = [x]
let FailList = []
let (>>=*) m f = List.concat (List.map f m);;
view raw gistfile1.fs hosted with ❤ by GitHub

Another pure monad construction is Option.

let ReturnOption x = Some x
let FailOption = None
let (>>=?) m f = Option.bind f m;;
view raw gistfile1.fs hosted with ❤ by GitHub

which allows us to chain together a series of operations that might fail, and bypass the calls made after any failure.

Exercise: Construct a pure monad based on trees.


An I/O Monad

These pure monads do not really have a notion of execution. We can use closures again to construct a monad which itself carries out some side-effecting computation and returns a value. As we did for continuations, we will wrap the closures up using a data type (as before this is to highlight the construction and can be elided):

type 'a io = IO of (unit -> 'a)
exception IOFailure
let RunIO (IO f) = f();;
view raw gistfile1.fs hosted with ❤ by GitHub

The principal functions for this are:

let ReturnIO x = IO (fun () -> x)
let FailIO = IO (fun () -> raise IOFailure)
let (>>=) m f = IO (fun () -> RunIO(f(RunIO m)));;
view raw gistfile1.fs hosted with ❤ by GitHub

The construction for bind (>>=) deliberately constructs a wrapped closure. At first sight it looks the same as f(runIO m), which has the same type, but we need to make sure that the sub-term runIO m is evaluated when the result of the bind is run rather than when it is constructed. As all of the other combinators are defined in terms of these three, we won’t need to worry about evaluation order for them. The issue will arise, however, for most special primitives, as we will want side effects to occur at the right time.

To read and write characters we can use:

let GetC = IO (fun () -> System.Console.ReadKey(true).KeyChar)
let PutC (c:char) = IO (fun () -> System.Console.Write c) ;;
view raw gistfile1.fs hosted with ❤ by GitHub

We can, of course, string a lot of these together using the derived combinators >>= and >>. The result does nothing until run. For example:

let EchoC = GetC >>= PutC;;
view raw gistfile1.fs hosted with ❤ by GitHub

We can also define read and write for strings:

let GetS length =
let rec get chars count =
if count > 0
then GetC >>= fun c -> get (chars@[c]) (count – 1)
else Return IO String(List.toArray chars)
get [] length;;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: write PutS.


Computation expressions

F# does provide a form of syntactic sugar that allows us to write monadic code, and continuation passing, in a form that is actually comprehensible to the programmer.

Consider the Option monad. If we introduce a singleton type

type OptionBuilder() =
member self.Bind(input, operation) = Option.bind operation input
member self.Delay (operation:(unit -> 'a option)) = operation ()
member self.Return input = Some input
member self.ReturnFrom (input : 'a option) = input
let option = OptionBuilder();;
view raw gistfile1.fs hosted with ❤ by GitHub

where Delay has the effect (as in the >>= operation for our I/O monad) of delaying the execution of the operation until the appropriate point in the sequence, we can write the monadic code using chains of >>=? combinators as a sequence of assignment-like let! expressions.

let result = option {
let! y = f x
let! z = g y
return! h z};;
view raw gistfile1.fs hosted with ❤ by GitHub

where f, g, h are functions taking plain values to option values.

The actual compiled form of the let! binding

builder {let! pattern = expr
compound_expr }
view raw gistfile1.fs hosted with ❤ by GitHub

is in the continuation form

builder.Bind(expr, (fun pattern -> compound_expr))
view raw gistfile1.fs hosted with ❤ by GitHub

This not only allows us to write monadic code in something close to the syntactic sugar form as alluded to above, but also to express naturally continuation based code (e.g. asynchronous operations) in a sequential style (indeed the async computation builder is a standard part of the library for exactly this purpose).

We can also express the equivalence of the two notations by building a monadic form from a computation expression, such as the async computation builder:

let ReturnAsync x = async { return x }
let (>>=!) m f = async { return Async.RunSynchronously(f(Async.RunSynchronously m)) }
let (>>!) m1 m2 = m1 >>=! fun () -> m2
let (>=>!) f g = fun x -> ((f x) >>=! g)
let ApplyAsync m1 m2 =
m1 >>=! fun f ->
m2 >>=! fun x ->
ReturnAsync (f x)
let LiftAsync f = ApplyAsync (ReturnAsync f)
let SequenceAsync (l : _ list) =
Async.Parallel l >>=! fun a ->
ReturnAsync(Array.toList a);;
view raw gistfile1.fs hosted with ❤ by GitHub

Concluding Remarks

Now, like Wittgenstein's Ladder once you have grokked the content of this course, you should be able to discard the often inefficient implementations used for simplicity (such as avoiding cluttering the example code with reversing cons'd lists), and be confident in using a functional style where complex programs can be built from simple pure elements composed together.

As to the F# language itself, we have barely skimmed the surface -- material like active patterns, quotations (the missing last lecture), the standard collections modules, reactive events and more still remain untouched. But in my experience, once the initial hurdle of the programming style has been crossed, picking those up when you realise you need them is the easy part; whereas before the "Aha!" moment the motivation and need for such features will remain mostly obscure, hidden behind the dilemma of "how do you program when your variables don't?" .

Exercise: Test your understanding by reworking the course material for another language of your choice. Note that for some functional languages, parts will not be applicable as they stand (e.g. neither assignment nor the shared-state queue will be expressible in Erlang); and that for many languages pattern based dispatch and infix notation will not be available.



Sunday, December 29, 2013

An introduction to Functional Programming in F# -- Part 5: Objects and Exceptions

This session marks a pause for breath between the introductory parts where we concentrated on doing interesting work with mostly simple immutable lists and with algebraic datatypes, and the last part which will be all heavy lifting, to look at other ways of representing compound data, and how the functional world interacts with the more familiar parts of the .net framework. Also, this week, no exercises either.

I'm keeping this part in mainly for completeness' sake, as a transcription of the original seminar material from '08; rather than for any great insights it offers. In particular, unlike the previous sessions there won't be any of the two steps to comprehend -- first, how the program works, and then with increasing familiarity, why you wouldn't actually implement things in quite that way in a real program. It'll be more like a shopping list instead.


Topics covered in the series


Assignment

Actually we covered assignment and mutability last time.


Tuples

We used tuples -- collections of a fixed number of items, defined by the types of each item, way back in the first session. Recall

let Half_Adder x y = (XOR x y, AND x y);;
view raw gistfile1.fs hosted with ❤ by GitHub

The value returned by this function is a 2-tuple of the Bits type -- written Bits * Bits; but in general the types in a tuple will be heterogeneous.

Tuples are usually best employed as ad hoc datastructures; while there are standard functions

let fst (a,b) = a;;
let snd (a,b) = b;;
view raw gistfile1.fs hosted with ❤ by GitHub

for extracting members of a 2-tuple, getting at parts of longer tuples will involve destructuring pattern matching.


Records

A record is more like a class -- or, really, a 'C'-style struct -- in having a series of named fields. We have actually used records in their simplest form, in some of our previous examples

type SEXPR = SYMBOL of string | NUMBER of int | NIL | CONS of (SEXPR * SEXPR) | COMPUTATION of COMP
and COMP = { func : SEXPR -> SEXPR } ;;
view raw gistfile1.fs hosted with ❤ by GitHub

because functions are just values, we can put them into a record just as well as other types. We can be more object like if we mix scalar and function values in a record:

type Point = { x: int; y: int; move: Point -> Point }
let rec makePoint vx vy =
let doMove p = makePoint (vx + p.x) (vy + p.y)
{ x = vx; y = vy; move = doMove };;
view raw gistfile1.fs hosted with ❤ by GitHub

which defines a 2D point type with a displacement-vector behaviour.

Records themselves are immutable, so we can only create new records based off old ones e.g.

let p = (makePoint 1 2).move(makePoint 2 3);;
view raw gistfile1.fs hosted with ❤ by GitHub

There is a copy but replace one or more fields syntax

{p with x = 6; y = 0 };;
view raw gistfile1.fs hosted with ❤ by GitHub

but this of course violates the contract that the move function displaces its target by the x,y value of the point. However, types can have methods

type Point = { mutable x: int; mutable y: int }
with member p.move q = p.x <- p.x + q.x;
p.y <- p.y + q.y
let makePoint x y = { x = x; y = y }
view raw gistfile1.fs hosted with ❤ by GitHub

and with the move member working on the current rather than creation time value of the coordinates, we have a class-like structure, albeit with public fields. And most of the time this is sufficient, if the type itself is not exposed.


Objects and interfaces

When interfacing with .net libraries, we need to define actual objects or interfaces. Our previous point behaviour can be expressed as an interface

type IPoint
= interface
abstract x: int;
abstract y: int;
abstract move: IPoint -> unit
end
view raw gistfile1.fs hosted with ❤ by GitHub

which we can implement as

type Point
= class
val mutable vx: int;
val mutable vy: int;
new(x,y) = { vx = x; vy = y }
interface IPoint
with member p.move q = p.vx <- p.vx + q.x;
p.vy <- p.vy + q.y
member p.x = p.vx
member p.y = p.vy
end
let makePoint x y = new Point(x,y) :> IPoint;;
view raw gistfile1.fs hosted with ❤ by GitHub

Note that F# syntax requires us to explicitly implement the interface; as a consequence, we also need to cast our concrete implementation explicitly to the interface type when we wish to refer to it as such.

There is an alternative syntax which removes the explicit new member as a constructor, and instead makes the class body act as a constructor function:

type Point(x,y)
= class
let mutable vx = x
let mutable vy = y
interface IPoint
with member p.move q = vx <- vx + q.x;
vy <- vy + q.y
member p.x = vx
member p.y = vy
end
view raw gistfile1.fs hosted with ❤ by GitHub

The two are similar, but not quite the same; for this second example vx and vy are not (as the result of pasting the code into the interactive prompt shows) instance variables, as they were before, but are in locals in the constructor function.

Alternatively, we can define an abstract base class, rather than an interface. This requires an attribute annotation, rather than a keyword:

[<AbstractClass>]
type AbstractPoint
= class
val mutable vx: int;
val mutable vy: int;
new(x,y) = { vx = x; vy = y }
member p.x with get() = p.vx and set z = p.vx <- z
member p.y with get() = p.vy and set z = p.vy <- z
abstract move: AbstractPoint -> unit
end
type Point'
= class
inherit AbstractPoint
new(x,y) = { inherit AbstractPoint(x,y) }
override p.move q = p.x <- p.x + q.x;
p.y <- p.y + q.y
end
let p = Point'(2,3)
let q = Point'(1,2)
p.move(q)
p.x
p.y;;
view raw gistfile1.fs hosted with ❤ by GitHub

Note that this time we don't have to cast anything to the base class in order to invoke the move method. A Point' is an AbstractPoint.


Exceptions

These have the sort of behaviour that we are familiar with

  • Exceptions do not affect the types of functions that throw them.
  • Exceptions “propagate up the call stack” until a handler is found (or to top level).

There are two built-in exception mechanisms

// Raise an ArgumentException:
invalid_arg "list must not be empty"
// Raise a FailureException:
failwith "internal error"
view raw gistfile1.fs hosted with ❤ by GitHub

or we can just raise a standard exception type

raise (InvalidOperationException("help!"))
view raw gistfile1.fs hosted with ❤ by GitHub

New exception types can be declared by subclassing, or by ML syntax

exception Special of int
raise (Special 99)
view raw gistfile1.fs hosted with ❤ by GitHub

Exceptions are handled much like they are in C#/C++/Java...

exception EmptyList of string
let rec folds g x
= match x
with [] -> raise (EmptyList "folds")
| [x] -> x
| h::t -> g h (folds g t)
try folds (fun x y -> x + y) []
with EmptyList m -> stdout.WriteLine(m);
0
view raw gistfile1.fs hosted with ❤ by GitHub

where the with construct actually allows pattern matching over the exception (so can be the equivalent of multiple catch clauses)

There is a related construct; a finally clause can also be used with try as in C#, to add some code which is always executed:

let resource = getResource()
in try folds (useResource resource) []
with EmptyList m -> stdout.WriteLine(m);
rethrow()
finally releaseResource resource
view raw gistfile1.fs hosted with ❤ by GitHub

though there is no portmanteau try ... catch ... finally as there is in C#.

Note that because we are in the .net framework here, exceptions are expensive operations, unlike in OCAML, where they are sufficiently cheap as to be a standard control flow construct.

Friday, December 20, 2013

An introduction to Functional Programming in F# -- Part 4: Concurrency and Networking

So far we have dealt with entirely pure functions and immutable data, where the program result might just as well be generated at compile-time, like a C++ template meta-program. This time we look at violating these constraints.

As has been the case for the previous posts, this discussion focuses on the principles of functional style and how to achieve the results we want from first principles, rather than going directly to library code, especially when introducing new concepts.

Topics covered in the series


Assignment

The type of variable that one is familiar with in languages like C# or Java is a reference -- the variable name refers to a box that contains a value. F# has an explicit form or type 'a ref to denote a box that can hold values that are of type 'a, e.g.

let state = ref 0;;
view raw gistfile1.fs hosted with ❤ by GitHub

where the integer zero is an immutable value all of its own.

We get the value out of the box by the function

(!): ‘a ref -> ‘a
!state;;
view raw gistfile1.fs hosted with ❤ by GitHub

We change the value in the box by the function

(:=) : ‘a ref -> ‘a -> unit
state := 1;;
view raw gistfile1.fs hosted with ❤ by GitHub

F# also offers the modifier mutable and the operator <-; a mutable value needs no special treatment to be read, but update is done using the <- operator. Using this we can emulate 'a ref as a record type (as we used to hold computations last time)

type 'a Ref = { mutable value : 'a }
let (!) (r : 'a Ref) = r.value
let (:=) (r : 'a Ref) (v:'a) = r.value <- v;;
view raw gistfile1.fs hosted with ❤ by GitHub

We will favour references for the rest of the post, as their mutability is explicit at all sites of use.


Local Persistent State

We can use references to maintain persistent state within program, and encapsulate it, without needing to yet reach for the explicitly object-oriented features of the language. Consider this function:

let gen f =
let i = ref 0
fun () -> (i := !i + 1; f !i);;
view raw gistfile1.txt hosted with ❤ by GitHub

the function gen produces a generator function that uses a private variable to track the number of times it is called, so producing f 1, then f 2, as in

let sq x = x*x;;
let g = gen sq;;
view raw gistfile1.fs hosted with ❤ by GitHub

where successive calls produce 1, 4, 9, 16...


Sharing a Reference Variable

If we have the following functions

let hd (x::xs) = x;;
let tl (x::xs) = xs;;
view raw gistfile1.fs hosted with ❤ by GitHub

to extract the head and the tail of a list, we can model a queue using shared mutable state as

let stream init =
let state = ref init
let put x = (state := (!state)@[x])
let get () = let r = hd !state
state := tl !state
r
let isEmpty () = !state = []
(put, get, isEmpty);;
view raw gistfile1.fs hosted with ❤ by GitHub

Here stream takes an initializing list, and returns functions add an item, remove an item and test for exhaustion of the queue.

Exercise: This is a naive implementation -- the behaviour of put is linear in the number of elements in the queue. Immutable queues are implemented as pairs of lists with occasional list reversals leading to amortized constant time to flow an item through the queue, and a similar internal representation could be used here. Rewrite stream to use such an implementation.


Simple concurrency

The problem with shared state as in our queue example above comes when multiple concurrent operations attempt to modify the state, rendering its output, or worse, its actual internal state, inconsistent. So, using the familiar thread and lock model for concurrency, we can introduce:

let spawn f = (new Thread(new ThreadStart(f))).Start();;
view raw gistfile1.fs hosted with ❤ by GitHub

for thread creation, and

let getLock () = new ReaderWriterLock();;
let acquireWrite (lock:ReaderWriterLock) = lock.AcquireWriterLock(-1);;
let releaseWrite (lock:ReaderWriterLock) = lock.ReleaseWriterLock();;
let acquireRead (lock:ReaderWriterLock) = lock.AcquireReaderLock(-1);;
let releaseRead (lock:ReaderWriterLock) = lock.ReleaseReaderLock();;
let upgrade (lock:ReaderWriterLock) = lock.UpgradeToWriterLock(-1);;
let downgrade (lock:ReaderWriterLock) cookie = lock.DowngradeFromWriterLock(cookie);;
view raw gistfile1.fs hosted with ❤ by GitHub

to manage locks.

Aside: we could use other concurrency APIs to initiate execution, e.g. Task; what is important here is that there is the possibility for concurrent access to any shared mutable state.

These operations block indefinitely, so prevent data races while offering the opportunity for deadlock instead.


A Concurrent Queue

With the lock operations, we can modify our queue to use a lock to manage access to the shared state, placing a write lock around the mutations and a read lock around the other accesses:

let stream init =
let state = ref init
let lock = getLock()
let put x = acquireWrite lock
state := (!state)@[x]
releaseWrite lock
let get () =
let r = ref None
acquireRead lock;
if !state = []
then releaseRead lock
!r
else let u = ref (upgrade lock)
r := Some(hd !state)
state := tl !state
downgrade lock u;
releaseRead lock
!r
in (put, get);;
view raw gistfile1.fs hosted with ❤ by GitHub

We also introduce the standard type 'a option which has the form

type 'a option = None | Some of 'a;;
view raw gistfile1.fs hosted with ❤ by GitHub

to signal the empty state in a more uniform way than having to test the previous isEmpty function.

Exercise: Add locking to the result of the previous exercise .

Aside: This style of code is of course not safe against exceptions that unwind the stack while a lock is being held.


Asynchronous Composition

We can connect two functions, producer and consumer, by a stream thus:

let ( ||> ) f g =
let (put,get) = stream []
let rec loop1 () = put( f ())
loop1 ()
let rec loop2 () =
match get() with
None -> ()
| Some(x) -> g x
loop2 ()
spawn loop1
spawn loop2;;
// for example:
let pp i = printfn "a number %d" i;;
(gen sq) ||> pp;;
view raw gistfile1.fs hosted with ❤ by GitHub

which will run printing squares until you halt the environment.


Functional networking

We can develop this idea further with the following concepts:

  • It’s asynchronous composition on different computers or between different processes on the same computer.
  • Make byte streams look like lists.
  • Build about the simple idea of null terminates strings as messages.

Depending how we make the connections, we can either model simple one-way messaging (or asynchronous messaging if each counter-party transmits on its own schedule), or request/response (or synchronous messaging).

Aside: This approach is akin to that used by Erlang, though this is only a very first approximation to the Erlang model.


Developing the network model

Taking advantage of the .net library methods that provide a stream-oriented view on a connection, we can write helper methods

let read (s:NetworkStream) =
let i = s.ReadByte()
if i < 0 then None else Some(byte i);;
let write b (s:NetworkStream) = s.WriteByte(b);;
let close (s:NetworkStream) = s.Close();;
view raw gistfile1.fs hosted with ❤ by GitHub

and make the interconversion with lists as

(* read a null terminated stream of bytes into a list *)
let rec streamToList s =
let b = read s
match b with
None -> []
| Some(x) -> if x = 0uy then [] else x::(streamToList s);;
(* write a list of bytes out as a stream *)
let rec listToStream l s =
match l with
[] -> ()
| x::xs -> (write x s; listToStream xs s);;
view raw gistfile1.fs hosted with ❤ by GitHub

A simple echo server and client

Using the same familiar techniques, we can build up string to byte-list inter-conversions, and put together functions to put a string onto a TCP connection and then read it off again. It's then a simple matter of calling the appropriate network APIs to open a listening socket and make connections for the duration of a request and response.

The example code at the end of the listing below spins the server code on a separate thread, allowing the client to execute in the same interactive session.

open System.Net;;
open System.Net.Sockets;;
open System.Threading;;
let addressFamily = AddressFamily.InterNetwork;;
// or AddressFamily.InterNetworkV6;;
let get_host_address() = Dns.GetHostEntry(Dns.GetHostName()).AddressList
|> Seq.filter (fun (x:IPAddress) -> x.AddressFamily = addressFamily)
|> Seq.head;
let local_host_address = get_host_address ();;
let spawn f =
(new Thread(new ThreadStart(f))).Start();;
let backlog = 2;;
let read (s:NetworkStream) =
let i = s.ReadByte()
if i < 0 then None else Some(int8 i);;
let write b (s:NetworkStream) = s.WriteByte(b);;
let close (s:NetworkStream) = s.Close();;
(* read a null terminated stream of bytes into a list *)
let rec streamToList s =
let b = read s
match b with
None -> []
|Some(x) -> if x = 0y then [] else (uint8 x)::(streamToList s);;
(* write a list of bytes out as a stream *)
let rec listToStream l s =
match l with
[] -> ()
|x::xs -> (write x s; listToStream xs s);;
(* -- create string variants for null terminated strings -- *)
let rec toList (i: string) =
if i.Length > 0
then ((i.Chars(0))::(toList (i.Substring(1))))
else [] ;;
let rec map f l =
match l with
x::xs -> (f x) :: (map f xs)
| [] -> [];;
let rec toString (ls: char list) =
match ls with
[] -> ""
|x::xs -> (string(x)) + (toString xs);;
let string_to_ascii (s:string) = System.Text.Encoding.ASCII.GetBytes(s);;
let ascii_to_string (b:byte[]) = System.Text.Encoding.ASCII.GetString(b);;
let charof b = System.Text.Encoding.ASCII.GetString([| b |]).Chars(0);;
let byteof c = System.Text.Encoding.ASCII.GetBytes([| c |] ).[0];;
let bytelist s = map byteof (toList s);;
let string bl = toString (map charof bl);;
let nullTerminated str = (bytelist str) @ [0uy];;
let writeString ns str = listToStream (nullTerminated str) ns;;
let readString ns = string (streamToList ns);;
let rec listen f (sock:Socket) =
let csock = sock.Accept()
spawn(fun () -> (f (readString (new NetworkStream(csock))))) ; listen f sock ;;
let rec echo f (sock:Socket) =
let csock = sock.Accept()
let ns = new NetworkStream(csock)
(spawn(fun () -> writeString ns (f (readString ns))) ; echo f sock) ;;
let listenOn f portnum =
let sock = new Socket(addressFamily, SocketType.Stream, ProtocolType.IP)
sock.Bind(new IPEndPoint(local_host_address, portnum));
sock.Listen(backlog);
listen f sock;;
let echoOn f portnum =
let sock = new Socket(addressFamily, SocketType.Stream, ProtocolType.IP)
sock.Bind(new IPEndPoint(local_host_address, portnum));
sock.Listen(backlog);
echo f sock;;
let asynchListenOn f portnum = spawn(fun () -> listenOn f portnum);;
let asynchEchoOn f portnum = spawn(fun () -> echoOn f portnum);;
(* ----- client bits ---*)
let message portnum mess =
let client_socket = new Socket(addressFamily, SocketType.Stream, ProtocolType.IP)
client_socket.Connect(local_host_address, portnum);
let client_stream = new NetworkStream(client_socket)
writeString client_stream mess;;
let converse portnum mess =
let client_socket = new Socket(addressFamily, SocketType.Stream, ProtocolType.IP)
client_socket.Connect(local_host_address, portnum);
let client_stream = new NetworkStream(client_socket)
writeString client_stream mess; readString client_stream;;
(* ----- example ---*)
// server =
let pp x = (printfn "%s" x; "Echo::" + x);;
asynchEchoOn pp 8001;;
// client =
converse 8001 "ahoy";;
view raw gistfile1.fs hosted with ❤ by GitHub

Wednesday, December 18, 2013

An introduction to Functional Programming in F# -- Part 3: Interpreters

Last time, we developed a parser for a simple LISP style language; now we develop from the abstract syntax tree the mechanism whereby we can actually evaluate expressions (programs) written in it.

Topics covered in the series


A simple language

We define the behaviour of our language on top of the abstract syntax thus:


Expr =Example
(LET list_of_dotted_pairs EXPR) |(Let ((x . 1) (y. 2)) (Add x y))
(IF EXPR EXPR EXPR) |(IF (EQ x y) x (CONS x y))
(LAMBDA list_of_variables EXPR) |(LAMBDA (x y) (ADD x y 1))
(REC variable list_of_variables EXPR) |(REC F (N) (IF (EQ x 0) 1 (MULT N (F (SUB N 1)))))
(QUOTE EXPR) |(QUOTE (1 2 3))
(EVAL EXPR) |(EVAL (QUOTE (ADD 1 2)))
(EVAL EXPR environment_expression) |(EVAL (QUOTE (ADD X 1)) (QUOTE ((X . 2))))
(EXPR EXPR*) |(ADD (LENGTH L) (LENGTH J) 3)
SYMBOL |X
NUMBER 3

where the coloured background denotes special forms (keyword related constructs) in the language, and the rest are call-by-value expressions. The lower-case descriptive names represent the following expansions

list_of_dotted_pairs = ((EXPR . EXPR)*)
list_of_variables = ( SYMBOL*)
variable = SYMBOL
environment_expression = EXPR

In code, we extend the algebraic type we used last time with the notion of a function

type SEXPR = SYMBOL of string | NUMBER of int | NIL | CONS of (SEXPR * SEXPR) | COMPUTATION of COMP
and COMP = { func : SEXPR -> SEXPR } ;;
view raw gistfile1.fs hosted with ❤ by GitHub

in the form of the COMP type; and modify the value of a SYMBOL to be a string.

Aside: the value of a COMP type is a record, a series of named fields gathered together. In this case there is one field, and that has a value that is a function mapping one SEXPR to another.

The Structure of the Evaluator

In pseudocode terms, given an expression as above, an interpreter would evaluate it with a recursive function eval that looks like this:

To eval an expression e in an environment (a construct where symbol definitions live) env

  • if e is NIL then NIL
  • else if e is a number n then n
  • else if e is a SYMBOL then look it up in the environment
  • else if e is a special form then evaluate special form
  • else if e is of form (expr1 expr2 ... exprN) then (eval expr1 env) (eval expr2 env) ... (eval exprN env)
  • else ERROR

The Structure of the Environment

We introduced the notion of an environment as where symbol definitions live; we express it as a look-up table from symbol to value like this (where we continue to use the list as our standard functional data-structure):

so that structurally the environment is an SEXPR which itself is a list of pairs of SYMBOL * SEXPR.

In code, the evaluator looks like this

let rec eval e x =
match x with
// Call by value cases
NIL -> NIL
|(SYMBOL s) -> lookup (SYMBOL s) e
|(NUMBER i) -> (NUMBER i)
// "IF" special form
|CONS((SYMBOL "IF"), b) -> match b with
CONS(cond, CONS(thenpart, CONS (elsepart, NIL))) ->
match (eval e cond) with
| NIL -> (eval e elsepart)
| _ -> (eval e thenpart)
| _ -> ERROR2 "IF" b
// "LET" special form
|CONS((SYMBOL "LET"), b) -> match b with
CONS(bindings, CONS(body, NIL)) ->
let pairs = applySnd (eval e) bindings
eval (APPEND pairs e) body
| _ -> ERROR2 "LET" b
// "LAMBDA" special form
|CONS((SYMBOL "LAMBDA"),b) -> match b with
CONS(formals, CONS(body, NIL)) ->
COMPUTATION {func = fun actuals -> (let pairs = ZIP formals actuals
eval (APPEND pairs e) body)}
| _ -> ERROR2 "LAMBDA" b
// "REC" special form
|CONS((SYMBOL "REC"),b) -> match b with
CONS(F, CONS(formals, CONS(body, NIL))) ->
let rec f =
COMPUTATION {func = fun actuals -> (let pairs = ZIP formals actuals
eval (APPEND pairs (CONS(CONS(F, f), e))) body)}
f
| _ -> ERROR2 "REC" b
// "QUOTE" special form
|CONS((SYMBOL "QUOTE"),CONS(X,NIL)) -> X
// EXTRAS
|CONS((SYMBOL "ENV"), NIL) -> e
|CONS((SYMBOL "EVAL"), CONS(X, NIL)) -> eval NIL (eval NIL X)
|CONS((SYMBOL "EVAL"), CONS(X, CONS(Y, NIL))) -> eval (eval NIL Y) (eval NIL X)
// Application
|CONS(a, b) -> apply (eval e a) (MAP (eval e) b)
|x -> ERROR1 x;;
view raw gistfile1.fs hosted with ❤ by GitHub

Note: Unlike our previous examples this doesn't just paste straight into the interpreter and work, as it has dangling references. The full set of new source appears at the end of this post.

The function starts out with the call by value cases from our pseudo-code; their behaviour should be obvious, given a suitable definition of lookup that walks the environment and returns the matching SEXPR or a suitable default if not found.


The Special forms

Our special form IF works like

eval (IF E1 E2 E3) env = if (eval E1 env) then (eval E2 env) else (eval E3 env)
view raw gistfile1.fs hosted with ❤ by GitHub

where we have made the design decision that all values are "truthy" except NIL. We have to match on our predicate because, as an implementation detail, we can't test functions for equality.

Our special form LET introduces functions applySnd which takes the list of dotted pairs represented as SEXPR * SEXPR, and recursively evaluates the second of each pair, as in

let rec applySnd f l =
match l with
[] -> []
| (x,y)::xs -> (x, f y) :: (applySnd f xs);;
view raw gistfile1.fs hosted with ❤ by GitHub

and APPEND which takes a list of SEXPR * SEXPR and returns that list merges that with the input environment (another list of SEXPR * SEXPR).

Having generated a new environment by merging all the evaluated bindings, i.e. (APPEND pairs e), it then evaluates the body of the expression in that new environment.

Our special form LAMBDA behaves similarly, creating a function that when executed makes local bindings from the formal parameters to the corresponding values (ZIP being the standard operation to map a pair of lists to a list of pairs), and uses that to define a new environment for evaluating the body.

The special form REC returns a self-referencing computation, but is otherwise similar to the LAMBDA form. Note that f is not itself a function; this use emphasises that the F# rec keyword merely brings a variable into scope inside its own definition. so the definition can be recursive.

The special form QUOTE returns the input expression unevaluated.


The "Extras", or "code is data is code"

The evaluator provides access to the environment from within the language; symbol ENV yields the environment as a list of dotted pairs.

The symbol EVAL with one argument evaluates an expression within the current environment for the interpreter, and with two arguments, allows a different environment to be specified. This is how data becomes code -- it is able to cause itself self to execute.


Remaining cases -- Functions and errors

Function application is modelled by taking a fully evaluated list, whose first element is expected to be a COMPUTATION and applying it to the remainder of the list of arguments.

In the code

apply (eval e a) (MAP (eval e) b)
view raw gistfile1.fs hosted with ❤ by GitHub

MAP applies the evaluator to each member of the list b to achieve this fully evaluated state; and apply looks like

let apply f l =
match f with
COMPUTATION { func = f' } -> f' l
| ERROR CONS(f, l);;
view raw gistfile1.fs hosted with ❤ by GitHub

Built-in functions are represented in this model by providing a standard environment with computations defined.

Anything else that is superficially syntactically valid but matches none of the expected forms results in an error.


Exercises

  • Modify the parser from last time to work with the new definition of SEXPR
  • Add Boolean operators AND, OR, NOT, EQUIV, IMPLIES to evaluator
  • Add built in functions to map between SYMBOLS and List of Characters and back again.
  • Add QUASIQUOTE (QQ), UNQUOTE with the meaning that (QQ (ADD (UNQUOTE X) (UNQUOTE Y) 23)) in an environment where X and Y are 123 and Z respectively evaluates to the S-expression (ADD 123 Z 23)
  • (Advanced): Change the evaluator to allow deal with interpreted code in the environment e.g. F is bound to (INTERPRET E) in env and (F X) is (eval E env)(eval X env). Alternatively extend the type of SEXPR to include an INTERPRETED constructor and do the same
  • (Advanced) Change the evaluator to allow special forms to be placed in the environment.

Putting it all together

I haven't copied over the parser from last time (that's the first exercise); but all the missing new bits are here.

type SEXPR = SYMBOL of string | NUMBER of int | NIL | CONS of (SEXPR * SEXPR) | COMPUTATION of COMP
and COMP = { func : SEXPR -> SEXPR } ;;
// Lexer and Parser go here
let EQUAL l1 l2 =
let rec tf s1 s2 = match (s1, s2) with
(SYMBOL xs, SYMBOL ys) -> xs = ys
| (NIL, NIL) -> true
| (NUMBER x, NUMBER y) -> x = y
| (CONS (a, b), CONS(c, d)) -> (tf a c) && (tf b d)
| _ -> false
if tf l1 l2 then (SYMBOL "TRUE") else NIL
let rec lookup x l =
match l with
CONS(CONS(a, b), c) -> match EQUAL x a with
NIL -> lookup x c
| _ -> b
| _ -> NIL;;
let rec MAP f l =
match l with
CONS(a, b) -> CONS(f a, MAP f b)
| _ -> NIL;;
let HD l =
match l with
CONS(a, b) -> a
| _ -> NIL;;
let TL l =
match l with
CONS(a, b) -> b
| _ -> NIL;;
let NULL l =
match l with
NIL -> (SYMBOL "TRUE")
| _ -> NIL;;
let ATOMP l =
match l with
(SYMBOL _) -> (SYMBOL "TRUE")
|(NUMBER _) -> (SYMBOL "TRUE")
| _ -> NIL;;
let SYMBOLP l =
match l with
(SYMBOL _) -> (SYMBOL "TRUE")
| _ -> NIL;;
let NUMBERP l =
match l with
(NUMBER _) -> (SYMBOL "TRUE")
| _ -> NIL;;
let LISTP l =
match l with
CONS(a, b) -> (SYMBOL "TRUE")
| NIL -> (SYMBOL "TRUE")
| _ -> NIL;;
let OPERATOR op k l =
let rec AUX l =
match l with
CONS((NUMBER a), b) -> op a (AUX (TL l))
| _ -> k
in NUMBER (AUX l) ;;
let ADD l = OPERATOR ( + ) 0 l;;
let MULT l = OPERATOR ( * ) 1 l;;
let SUB l = OPERATOR ( - ) 0 l;;
let DIV l = OPERATOR ( / ) 1 l;;
let ZEROP l =
match l with
(NUMBER 0) -> (SYMBOL "TRUE")
|_ -> NIL;;
let rec APPEND l1 l2 =
match l1 with
CONS(a, b) -> CONS(a, APPEND b l2)
| NIL -> l2
| _ -> NIL;;
let rec ZIP l1 l2 =
match (l1, l2) with
(CONS(x, xs), CONS(y, ys)) -> CONS(CONS(x, y), ZIP xs ys)
| _ -> NIL;;
let apply f l =
match f with
COMPUTATION g -> g.func l
| _ -> NIL;;
let rec applySnd f l =
match l with
CONS(CONS(n, v), r) -> CONS(CONS(n, f v), applySnd f r)
| _ -> NIL;;
let BHD X = HD(HD X);;
let BTL X = TL(HD X);;
let BCONS X = CONS(HD X, HD(TL X));;
let BNULL X = NULL(HD X);;
let BATOMP X = NULL(HD X);;
let BSYMBOLP X = SYMBOLP (HD X);;
let BNUMBERP X = NUMBERP(HD X);;
let BLISTP X = LISTP(HD X);;
let BEQUAL X = EQUAL(HD X) (HD (TL X));;
let BADD X = ADD X;;
let BMULT X = MULT X;;
let BSUB X = SUB X;;
let BDIV X = DIV X;;
let BZEROP X = ZEROP (HD X);;
let builtin =
[SYMBOL "TRUE", SYMBOL "TRUE";
SYMBOL "FALSE", NIL;
SYMBOL "HD", COMPUTATION{func = BHD};
SYMBOL "TL", COMPUTATION{func = BTL};
SYMBOL "CONS", COMPUTATION{func = BCONS};
SYMBOL "NULL", COMPUTATION{func = BNULL};
SYMBOL "ATOMP", COMPUTATION{func = BATOMP};
SYMBOL "SYMBOLP", COMPUTATION{func = BSYMBOLP};
SYMBOL "NUMBERP", COMPUTATION{func = BNUMBERP};
SYMBOL "LISTP", COMPUTATION{func = BLISTP};
SYMBOL "EQUAL", COMPUTATION{func = BEQUAL};
SYMBOL "ADD", COMPUTATION{func = BADD};
SYMBOL "MULT", COMPUTATION{func = BMULT};
SYMBOL "SUB", COMPUTATION{func = BSUB};
SYMBOL "DIV", COMPUTATION{func = BDIV};
SYMBOL "ZEROP", COMPUTATION{func = BZEROP}];;
let rec makeEnv l =
match l with
[] -> NIL
|(x1, x2)::xs -> CONS(CONS(x1, x2), makeEnv xs);;
let BASE = makeEnv builtin;;
(* add error routines so that errors in secial forms report error and
impose error propogation in eval *)
let ERROR1 x = CONS((SYMBOL "ERROR"), CONS(x, NIL));;
let ERROR2 name x = CONS((SYMBOL "ERROR"), CONS(CONS(SYMBOL name, x), NIL));;
let rec eval e x =
match x with
(SYMBOL s) -> lookup (SYMBOL s) e
|(NUMBER i) -> (NUMBER i)
|NIL -> NIL
|CONS((SYMBOL "IF"), b) -> match b with
CONS(cond, CONS(thenpart, CONS(elsepart, NIL))) ->
match (eval e cond) with
NIL -> (eval e elsepart)
| _ -> (eval e thenpart)
| _ -> ERROR2 "IF" b
|CONS((SYMBOL "LET"), b) -> match b with
CONS(bindings, CONS(body, NIL)) ->
let pairs = applySnd (eval e) bindings
in
eval (APPEND pairs e) body
| _ -> ERROR2 "LET" b
|CONS((SYMBOL "LAMBDA"), b) -> match b with
CONS(formals, CONS(body, NIL)) ->
COMPUTATION {func = (fun actuals -> let pairs = ZIP formals actuals
eval (APPEND pairs e) body)}
| _ -> ERROR2 "LAMBDA" b
|CONS((SYMBOL "REC"), b) -> match b with
CONS(F, CONS(formals, CONS(body, NIL))) ->
let rec f =
COMPUTATION {func = (fun actuals -> let pairs = ZIP formals actuals
eval (APPEND pairs (CONS(CONS(F, f), e))) body)}
f
| _ -> ERROR2 "REC" b
|CONS((SYMBOL "QUOTE"), CONS(X, NIL)) -> X
|CONS((SYMBOL "ENV"), NIL) -> e
|CONS((SYMBOL "EVAL"), CONS(X, NIL)) -> eval e (eval e X)
|CONS((SYMBOL "EVAL"), CONS(X, CONS(Y, NIL))) -> eval (eval e Y) (eval e X)
|CONS(a, b) -> apply (eval e a) (MAP (eval e) b)
| X -> ERROR1 X;;
(* tidy up *)
let exec str = eval BASE (parse str);;
(* tests
exec "(LET ((F . (LAMBDA (X) (MULT X X)))) (F 6))";;
exec "(LET ((F . (REC F (N) (IF (EQUAL N 0) 1 (MULT N (F (SUB N 1))))) )) (F 6))";;
exec "(LET ((LEN . (REC F (L) (IF (EQUAL L NIL) 0 (ADD 1 (F (TL L))) ) ) ))
(LEN (QUOTE (1 2 3))))";;
*)
view raw gistfile1.fs hosted with ❤ by GitHub

Wednesday, December 11, 2013

An introduction to Functional Programming in F# -- Part 2: Lexers and Parsers

I hadn't meant this to be a weekly series, but that seems to be the tempo at which I'm transcribing things. So, without further ado, Part Two!

Topics covered in the series

  • Introduction to Functional Programming via circuits
  • Lexing and Parsing (this part)
  • Interpreters
  • Simple ‘Functional’ Distributed/Concurrent Programming
  • Non-Functional bits - Assignment, Exceptions, Objects, Libraries, etc.
  • Advanced stuff - Continuations, Monads, Generative Programming (Language Based programming), etc.

Our example

To illustrate the functional approach to the common problem of interpreting streams of data, this post will cover the lexing (or tokenizing) and parsing (rendering into syntactic form) of a language with a simple grammar. And there are no prizes for guessing that this simple language is LISP, with expressions looking like

  • (mult (add x 5) factor)
  • (bind ((x. 1) (y . 2) (z . 3)) (add x y z))
  • (defun append (x y)
            (if (null x) y
                 (cons (head x) (append (tail x) y))))

Schematically, what we want to build looks like

taking characters to lexemes to syntax tree.


Lexers

The lexer for our simple grammar is a state machine that looks like this.

It operates over a stream of characters, emitting tokens on the transitions marked with "⇒"

To start coding, we assume that the input is supplied to us as a string; and our first requirement is to turn this into a char list, which we can do, following the approach of recursive decomposition as introduced in the previous part:

let rec toList (i: string) =
if i.Length > 0
then (i.Chars(0))::(toList (i.Substring(1)))
else [] ;;
view raw gistfile1.fs hosted with ❤ by GitHub
Digression - purity of concept vs pragmatism

We are, in this post, working from first principles throughout, rather than going to library code -- the availability of mature facilities such as fslex/fsyacc or FParsec would reduce this example into just the incantations to initialise those.

This toList implementation is intended as a reminder of the technique used, without diverting into implementation detail. We can see that the repeated creation of temporary strings, for example, would be a potential performance issue in practice. Indeed, what one might write in practice would be more like

let toList s = (s :> char seq) |> Seq.toList;;
view raw gistfile1.fs hosted with ❤ by GitHub

but objects and modules come later in the series.


The basic tools


To describe our tokens, we introduce an algebraic type which, as well as being symbolic (as our Bits type before), also includes data

type lexical = BRA | KET | DOT | SYMATOM of char list | NUMATOM of int ;;
view raw gistfile1.fs hosted with ❤ by GitHub

-- this defines our lexemes. Then to interpret the input data, define the list of separators (referred to as stopcases in the state diagram above)

let stopcases = toList "().\n\t " ;;
view raw gistfile1.fs hosted with ❤ by GitHub

some of which are meaningful tokens in their own right, and digits

let digits = toList "1234567890" ;;
view raw gistfile1.fs hosted with ❤ by GitHub

We can then define predicates to identify the type of an input character. We build a method to test for membership in a list, and use that to define the tests for each type.

let rec mem x l =
match l with
[] -> false
| (y::ys) -> if x = y then true else mem x ys ;;
let stopcase x = mem x stopcases ;;
let digit x = mem x digits ;;
let alpha x = not(mem x (digits @ stopcases)) ;;
view raw gistfile1.fs hosted with ❤ by GitHub

We can also take digit streams and turn them into numeric values

let digitToint i =
match i with
'0' -> 0
| '1' -> 1
| '2' -> 2
| '3' -> 3
| '4' -> 4
| '5' -> 5
| '6' -> 6
| '7' -> 7
| '8' -> 8
| '9' -> 9
| (_) -> 0 ;;
let mkint s =
let rec mkintAux s a =
match s with
[] -> a
| (x::xs) -> mkintAux xs ((a*10) + (digitToint x))
mkintAux s 0;;
view raw gistfile1.fs hosted with ❤ by GitHub

The component lexers

We can now write the code representing the individual state transitions for symbol and number lexing

let rec lexSym s l =
match l with
[] -> (SYMATOM s, [])
| (x::xs) -> if stopcase x then (SYMATOM s, x::xs)
else lexSym (s@[x]) xs ;;
view raw gistfile1.fs hosted with ❤ by GitHub

which takes a list, and returns the lexeme and the rest of the list. The case for numbers is more complicated, since we have to deal with the case where an embedded letter makes the whole thing a symbol

let rec lexNumOrSym s l =
match l with
[]-> (NUMATOM(mkint s), [])
|(x::xs) ->
if (stopcase x) then ( NUMATOM(mkint s), x::xs)
else if (alpha x) then lexSym (s@[x]) xs
else lexNumOrSym (s@[x]) xs ;;
view raw gistfile1.fs hosted with ❤ by GitHub

all that remains is to assemble the pieces

let rec lexer l =
match l with
| [] -> []
| '\n'::xs -> lexer xs
| ' '::xs -> lexer xs
| '\t'::xs -> lexer xs
| '('::xs -> BRA::(lexer xs)
| ')'::xs -> KET::(lexer xs)
| '.'::xs -> DOT::(lexer xs)
| x::xs ->
let (l,r) = if alpha x then (lexSym [x] xs) else (lexNumOrSym [x] xs)
l :: lexer r ;;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: Write a lexer for .ini files (see http://en.wikipedia.org/wiki/INI_file).


The component parsers

The syntax we want to develop to distinguish our LISP from arbitrary streams of lemexes looks looks like this

  • SEXPR = ATOM |
            BRA SEXPR DOT SEXPR KET |
            BRA SEXPR* KET
  • ATOM =  SYMBOL | NUMBER 

The approach we take is to translate the grammar directly into a program by creating elementary parsers for bra, ket, dot, symbols, atoms etc. and combine them by operations of sequencing, alternation and iteration.


In the same way that the lexer worked in terms of a list of lexemes and the input yet to process, we define our parse state as the tuple (remaining input to be parsed, list of results so far); a parsing operation will either transform a parse state to another (moving data from input to output) or explicitly fail

Consider the case (mult 1 2) and assume BRA has been read already. The input parse state looks like ( [SYMATOM mult ; NUMATOM 1; NUMATOM 2; KET] , _)
where we use the _ as a conventional wildcard, and we want our symbol parser to return ( [NUMATOM 1; NUMATOM 2; KET ] , [SYMATOM mult] ) on success.

Rendering this as code

type 'a Maybe = Return of 'a | Fail;;
view raw gistfile1.fs hosted with ❤ by GitHub

creates a type that captures the idea of a success with some meaningful value or failure with none. Then we define the parser we want to handle the state where we expect the next lexeme to be a symbol as

let parseSym s =
match s with
Return((SYMATOM y)::ys, (_)) -> Return(ys,[SYMATOM y])
| (_) -> Fail ;
view raw gistfile1.fs hosted with ❤ by GitHub

which we see will do the job (modulo the wrapping in a Return) that we specified above.


The parser for a number will be much the same

let parseNum s =
match s with
Return((NUMATOM y)::ys, a) -> Return(ys,[NUMATOM y])
| (_) -> Fail ;;
view raw gistfile1.fs hosted with ❤ by GitHub

and for all the simple tokens we can write a general parser

let parseToken x s =
match s with
Return(y::ys, a) -> if x = y then Return(ys, []) else Fail
| (_) -> Fail ;;
view raw gistfile1.fs hosted with ❤ by GitHub

from which we produce the specific parsers

let bra x = parseToken BRA x;;
let ket x = parseToken KET x;;
let dot x = parseToken DOT x;;
view raw gistfile1.fs hosted with ❤ by GitHub

Combining the simple parsers

To express the notion that there is more than one valid lexeme at a particular point define

let (||) f g x =
let result = f x
if result = Fail then (g x) else result ;;
view raw gistfile1.fs hosted with ❤ by GitHub

to express the notion of "Try A if it Fails Try B".

Similarly, if we expect two particular results in succession we can define

let (>>) f g x =
let u = f x
match u with
Return(rest1, result1)
-> let v = g u
match v with
Return(rest2, result2) -> Return(rest2, result1 @ result2)
| Fail -> Fail
| Fail -> Fail ;;
view raw gistfile1.fs hosted with ❤ by GitHub

in words, "Try A if it fails the whole thing fails; if it succeeds then try B on the new parse state; if B fails the whole thing fails."

Then when we expect a number of the same thing in sequence

let rec ITERATE f x =
let u = f x
match u with
Fail -> x
| Return(rest2, result2) ->
match x with
Fail -> Fail
| Return(rest1, result1) ->
ITERATE f (Return(rest2, result1 @ result2));;
view raw gistfile1.fs hosted with ❤ by GitHub

"Try A to match the longest possible sequence of A’s, including a zero length sequence."


Exercise: Write the operator OPT such that OPT A matches zero or one occurrences of A (note A is a parser!); i.e. A is OPTIONAL.


Exercise: Write the operator PLUS such that PLUS A matches one or more occurrences of A.


Transforms

So far parsers only check the structure of an expression – we also want them to build an abstract structure corresponding to the string, with general form.


  • SEXPR = ATOM ⇒ ATOM-ACTION |
            BRA SEXPR DOT SEXPR KET ⇒ PAIR-ACTION |
            BRA SEXPR* KET ⇒ LIST-ACTION 
  • ATOM =  SYMBOL ⇒ SYMBOL-ACTION |
            NUMBER  ⇒ NUMBER-ACTION 

Transforms take a parse state and build a parse state of a different type. Where our old Parse-State was (remaining input to be parsed, list of lexical items so far) we ant to construct a
new Parse-State of the form (remaining input to be parsed, list of Lisp Trees (i.e. items built from ATOMS,NIL and CONS)).

In code, we define a representation of an S-expression as

type SEXPR = SYMBOL of char list | NUMBER of int | NIL | CONS of SEXPR * SEXPR ;;
view raw gistfile1.fs hosted with ❤ by GitHub

and the sort of syntax tree structure we want to build in terms of this type will look like


The general shape of a transform is

let (>:) f h s =
let u = f s
match u with
Return(rest, result)
-> Return(rest, h result)
| Fail -> Fail ;;
view raw gistfile1.fs hosted with ❤ by GitHub

which parses input according to f then transforms the parse state according to h.


The grammar we need to make the shape of the figure above is

  • SEXPR = ATOM ⇒ ATOM-ACTION |
            BRA SEXPR DOT SEXPR KET ⇒ CONS-PAIR |
            BRA SEXPR* KET ⇒ CONS-LIST 
  • ATOM =  SYMBOL ⇒ ATOM-ACTION |
            NUMBER  ⇒ ATOM-ACTION 

where the operations are

  • CONSPAIR – produce a singleton list containing a CONS of the items
  • CONSLIST – produce an explicit CONS cell list containing the items
  • ATOMACTION – translate base cases between data types.

or, in code:

let CONSPAIR [x;y] = [CONS(x,y)];;
let rec CONSLIST l =
let rec conslist y =
match y with
x::xs -> CONS(x, conslist xs)
| [] -> NIL
[conslist l];;
let ATOMACTION v =
match v with
[SYMATOM u] -> [SYMBOL u]
| [NUMATOM i] -> [NUMBER i]
| (_) -> [];;
view raw gistfile1.fs hosted with ❤ by GitHub

The Final Parser

After all the hard work, putting it together is rather anticlimactic. We define a helper

let parseAtom s = ((parseSym >: ATOMACTION) || (parseNum >: ATOMACTION)) s ;;
view raw gistfile1.fs hosted with ❤ by GitHub

Finally, putting it all into one bundle, of type string -> SEXPR

let parse str =
let r = parseSEXPR (Return (lexer (toList str), []))
match r with
Return(_ , result::_) -> result
| Return(_, []) -> NIL
| Fail -> NIL ;;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: Write a parser for an .ini file

Here's the sort of grammar to use:


  • File = Line*
  • Line = WHITESPACE CMD ‘NL’
  • WHITESPACE = (‘NL’|’TAB’|SPACE)*
  • CMD = SectionHeader | Binding | Comment
  • SectionHeader = ‘[‘ HeaderString ’]’
  • Binding = BinderString ‘=‘BinderString
  • HeaderString = anything but [ or ] or newline
  • BinderString = anything but = or newline | Quote
  • Quote = ‘”’ anything but quotes ‘”’
  • Comment = ‘;’ ANY ‘NL’

Exercise: What real-world performance implications are there in the use of this sort of parser? Re-implement one of the parsers to avoid the limitations of the strictly evaluated list datatype.



Wednesday, December 04, 2013

An introduction to Functional Programming in F# -- Part 1 : Functional Programming via circuits

The other day I came across the notes from a series of informal lunchtime seminars that I organised about five years ago, and thought it would be worth putting them into order and posting them (actually à propos of a recent question about lexing from first principles on the MSDN F# forum, a topic which we will come to in a subsequent post).

Credit where credit is due

My thanks to co-conspirators Will Harwood and Laurence Jordan who did all the heavy lifting to prepare the material for the original set of seminars.

Topics covered in the series

Scope

This series covers functional-programming concepts, as illustrated in F# syntax -- a way for beginners who have started to use the language to avoid just writing FORTRAN C# in any other language. It is not intended as an introductory guide to the language syntax itself, though I'll put in asides about the notations used where appropriate.

F#'s Background

F# descends from the ML series of languages; The initial ML was originally developed as part of the LCF project (1978)‏ by Milner et al.; this split into separate strands (SML; and Cambridge ML, or CAML). CAML development proceeded at INRIA, grew objects in the OCAML variant from the turn of the century, and that in turn provided the basis for F#, which has continued to evolve in what is now more than a decade since Don Syme first brought us that language.

As an ML language, we expect functions to be first-class objects, like any other type (passed as arguments; returned from functions; stored in data-structures; nested inside other functions, where they are lexically scoped, just like any other local variable); and that evaluation is strict (arguments are evaluated before a function call is entered).

Similarly, we expect a static type system that includes polymorphic types, but that type inference will actually lift the burden of pervasive type annotations from the programmer. Most types -- built-in, supplied in libraries or user-defined -- will have immutable value behaviour, though reference types exist that can provided controlled destructive update.

And, while these are standard fare nowadays, we expect the language environment to provide garbage collection and exception handling.

Digital Circuits

Now we actually start to use the language; these examples are intended to execute in the F# interactive console, so are punctuated by the ";;" syntax which is an ML end-of-expression mark, usually optional in project-based code, but needed in the interactive case as a way of signalling that there is now a compilable unit to deal with.

We can use F# to model simple logic gates (1 or 2 signals of either 0 or 1 in, and some Boolean function -- AND, OR, XOR,...).

// introduce simple type definitions
type Bits = Zero | One;;
// introduce functions over type by pattern matching
let AND x y =
match (x,y) with
(Zero,Zero) -> Zero
|(Zero, One) -> Zero
|(One, Zero) -> Zero
|(One, One) -> One ;;
let XOR x y =
match (x,y) with
(Zero,Zero) -> Zero
|(Zero, One) -> One
|(One, Zero) -> One
|(One, One) -> Zero;;
view raw gistfile1.fs hosted with ❤ by GitHub

Here we have defined an algebraic data-type (similar to an enum) and two functions that take a pair of values to return a single result. The functions themselves use match expressions (roughly, what a switch statement wants to be when it grows up) to say "if the input looks like this, then the result must be that".

Exercise: Define the standard Bit functions: OR, NOT, NAND, NOR

Binary Addition

An adder is a circuit that takes two streams of bits and produces a steam of bits as the answer

In standard fashion, we build an adder from two half-adders (which take two inputs and produce the XOR as the sum stream and the AND as the carry stream).

One half-adder takes the two inputs; the sum stream from that is passed into the second half-adder along with the previous carry bit, and the resulting sum forms the final output. Meanwhile the two intermediate carry streams are OR'd together to form the full carry stream.

// Introducing function definitions by composition
let Half_Adder x y = (XOR x y, AND x y);;
let Full_Adder x y carry =
let (s1, c1) = Half_Adder x y
let (s2, c2) = Half_Adder s1 carry
(s2, OR c1 c2);;
view raw gistfile1.fs hosted with ❤ by GitHub

Exercise: Write Full_Adder as a ‘pattern match’ function definition (i.e. substitute and simplify)

Some Basic Tools: Lists and Streams

A well typed list is a sequence of values where all the values are of the same type,say, 'a. This is an 'a list e.g. Bit list in ML notation (F# allows the more C++-like synonym list<'a> for convenience). A list is either empty or it is a pair of values composed of a value of type 'a and an 'a list.

// Introducing the built in type of lists: this is both recursive and polymorphic
// type 'a list = [] | (::) of 'a * 'a list
view raw gistfile1.fs hosted with ❤ by GitHub

We can define functions operating on lists by exploiting the structural recursion -- define a function by what it must do on each clause of the type definition. Example: computing the length of a list as

  • The length of an empty list is 0
  • The length of a non-empty list is 1 + the length of the second list
// simple recursive function definition using structural recursion over the list structure
let rec length l =
match l with
[] -> 0
| (x::xs) -> 1 + (length xs);;
view raw gistfile1.fs hosted with ❤ by GitHub

Here the rec keyword has the effect of bringing the function's own name into scope inside itself, thus allowing the recursive call.

We can perform more complex operations such as

let rec append l1 l2 =
match l1 with
[] -> l2
| (x::xs) -> x::(append xs l2) ;;
view raw gistfile1.txt hosted with ❤ by GitHub

to create the concatenation of two lists, or

//simple and inefficient reverse by structural recursion
let rec rev l =
match l with
[] -> []
| (x::xs) -> append (rev xs) [x];;
view raw gistfile1.fs hosted with ❤ by GitHub

Observe that in each case, the original lists are left undisturbed (there is no forward pointer pointing towards the head of the list, so in the append case, l2 is not affected by having some other list know about it, whereas the list cells of l1 are copied as they are appended).

A better implementation of reverse

//using an accumulator to speed things up; introducing auxiliary function definition
let reverse l =
let rec reverse_acc l a =
match l with
[] -> a
| (x::xs) -> reverse_acc xs (x::a)‏
reverse_acc l [];;
view raw gistfile1.fs hosted with ❤ by GitHub

This is an example of a function, reverse_acc, being defined as a local value inside the enclosing function, just as we might define an integer or string value.

Higher order functions

These are simply functions that have function arguments, like

// simple higher-order function
let twice f x = f(f x);;
// a free-standing function to square a number
fun x -> x*x;;
//Let's use it
(fun x -> x*x) 4;; // yields 16
twice (fun x -> x*x) 4;;
//Yields:
// (fun x -> x*x)((fun x -> x*x) 4) =
// (fun x -> x*x) 16 =
// 256
view raw gistfile1.fs hosted with ❤ by GitHub

And we can generalize twice like

let rec power f n x =
if n<= 0 then x else (power f (n-1) (f x));;
view raw gistfile1.fs hosted with ❤ by GitHub

to apply f n times.

Lists naturally lead us to higher order functions, as that is how we would describe operations that apply some simpler transformation to every element of the list.

//Create a new list by transforming each element of the input list
let rec map f l =
match l with
[] -> []
| (x::xs) -> (f x)::(map f xs);;
//Accumulate the elements of the list (in a general way)
// yields f(x1,f(x2,f(x3,f(x4,c))))‏ if l is [x1, x2, x3, x4]
let rec reduce f c l =
match l with
[] -> c
| (x::xs) -> f x (reduce f c xs);;
// Combine them to provide a different way to define length for a list
let len l = reduce ( + ) 0 (map (fun x -> 1) l);;
view raw gistfile1.fs hosted with ❤ by GitHub

where we use the fact that + is just a function with some syntactic sugar that allows us to write it like an infix operator.

Exercise: Define the function zip which when given two lists of the same length and returns a single list of pairs e.g. zip [1,2,3] [zero, one, zero] is [(1,zero),(2,one), (3,zero)]

Exercise: Define itlist which takes a function f, an initial value v, and a non-negative integer n and returns a list of the iterates of f on v, so itlist f v n is [v, f v, f(f v), f(f(f v))), ...., fn v]

Exercise: Define the function R_reduce f l c which returns f(f(f(f (x1, c) , x2) , x3), x4)if l is [x1, x2, x3, x4]

All these and more (though not always under the same names) are actually provided in the F# Lists module; these and the related functions in the other collections modules are your standard tools when working in the language. Not everything is list processing, but a surprising amount can be cast in that form.

The adder revisited

Now we have lists, we can model the way that the adder is supplied by sequences of bits representing a number, not just one at a time:

// note the reverse order of the lists at this point
let rec R_Adder bl1 bl2 c =
match (bl1, bl2) with
([],[]) -> []
| (x::xs, y::ys) ->
let (out, carry) = (Full_Adder x y c)
out::(R_Adder xs ys carry)
| (_) -> [];;
view raw gistfile1.fs hosted with ❤ by GitHub

Numbers may not be same length, so

  • Drop leading zeros
  • Pad with new leading zeros until they are the same length
// get rid of leading Zero's
let rec dropleadingzeros l =
match l with
[] -> []
| (Zero::xs) -> dropleadingzeros xs
| x -> x
// add n leading Zero's
let rec padby n l = if n <= 0 then l else padby (n-1) (Zero::l);;
// pad two lists to the same length
let rec pad (l1, l2) =
let len1 = length l1
let len2 = length l2
if (len1 = len2) then (l1,l2)
else if (len1 < len2) then (padby (len2-len1) l1, l2)
else (l1, padby (len1-len2) l2) ;;
view raw gistfile1.fs hosted with ❤ by GitHub

Numbers are fed to the adder LSB first, so would also need to be reversed from the natural MSB first form before and restored after, giving

let Adder bl1 bl2 =
let (l1,l2) = pad ((dropleadingzeros bl1), (dropleadingzeros bl2))
reverse(R_Adder (reverse (Zero::l1)) (reverse (Zero::l2)) Zero);;
view raw gistfile1.fs hosted with ❤ by GitHub

Generalising: Synchronous State Machine

The adder we saw above is a special case of a general form

which we can represent in code as

let rec StateMachine f s l =
match l with
[] -> []
| (x::xs) ->
let (out, state) = (f x s)
out::(StateMachine f state xs);;
view raw gistfile1.fs hosted with ❤ by GitHub

and in this form, the LSB-first implementation of the adder looks like

let X_R_Adder bl1 bl2 c =
let UC_Full_Adder (a,b) c = Full_Adder a b c
StateMachine UC_Full_Adder Zero (zip bl1 bl2) ;;
view raw gistfile1.fs hosted with ❤ by GitHub