Monday, May 03, 2021

F# under the covers XVIII -- lambdas and closures

Consider this code, using named and anonymous inner functions

  let F1 l =
    let aux i = i + 1

    let FI li =
      let rec FII lii acc =
        match lii with
        | [] -> acc
        | x :: xs -> FII xs (aux acc)
      FII li 0
    l |> List.map (fun i -> (string i).Length)

The inner functions are compiled as FSharpFunc objects, with values closed over being injected as constructor arguments.

Before .net 5.0.200, this would make function F1 look like

public static FSharpList<int> F1<a>(FSharpList<a> l)
{
	FSharpFunc<int, int> aux = new aux@9();
	FSharpTypeFunc FI = (FSharpTypeFunc)(object)new FI@11(aux);
	return ListModule.Map<a, int>((FSharpFunc<a, int>)new F1@17<a>(), l);
}

With .net 5.0.200, the fact that some of the inner functions -- like aux above -- are pure, closing over nothing, has been taken account of, and needless new object creation is avoided, in the same way that C# lambdas have long been cached after first use.

public static FSharpList<int> F1<a>(FSharpList<a> l)
{
	FSharpFunc<int, int> aux = aux@9.@_instance;
	FSharpTypeFunc FI = (FSharpTypeFunc)(object)new FI@11(aux);
	return ListModule.Map<a, int>((FSharpFunc<a, int>)F1@17<a>.@_instance, l);
}

where the aux and F1@17 functions -- the latter being the anonymous function used by List.map -- are referenced through a class internal static readonly value, rather than having to create a new instance every time.

String processing as a fold

Having occasion recently to ensure that text in XML/HTML containing non-ASCII (high-bit set) characters, but no control codes aside from line breaks, was presenting them as character references, the obvious algorithm in C#, using a StringBuilder, sb, was

  foreach( char ch in text )
  {
    if ( ch < 127 ) 
      { sb.Append(ch); }
    else
      { sb.AppendFormat( "&#x{0:X4}", (int) ch ); }
  }

In F# though, the obvious direct Seq.iter translation ends up needing |> ignore the results of the append operations. Since this is actually an accumulation operation into the StringBuilder, the better functional representation would be more like

  let sb = Seq.fold (fun (b:StringBuilder)
                         (c:char) -> let ic = int c
                                     if ic >= 127
                                     then b.AppendFormat( "&#x{0:X4};", ic )
                                     else b.Append(c))
              (StringBuilder(text.Length + extra)) // estimate the expansion up front
              text

which lets the StringBuilder flow naturally through the process, rather than closing over it and having to discard the value of the if expression. This could be done in C#, too along the lines of

  var sb = text.Aggregate(new StringBuilder(), (b, c) =>
                                     if (c >= 127) 
                                       {return b.AppendFormat("&#x{0:X4};", c);}
                                     else 
                                       {return b.Append(c);});

only here the returns have to be explicit.