Saturday, October 29, 2011

F# under the covers XIV -- Match expressions : Does my complexity look big in this?

The code generation in F# in debug builds is fond of adding plenty of extra branch points that will confuse any naive computation of code complexity based on counting branch instructions at the IL level. Let us take the innocent method

let function alist [...other args...] =
match alist with
| h :: tail -> // some computation
| [] -> []
view raw gistfile1.fs hosted with ❤ by GitHub

The instructions that map to the match alist with context include

IL_0004: call instance class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<...`1<float64>>>::get_TailOrNull()
IL_0009: brtrue.s IL_000d
IL_000b: br.s IL_0044
IL_000d: ldloc.0
view raw gistfile1.cs hosted with ❤ by GitHub

where the first branch could be replaced with a brfalse.s IL_0044 and the code at offsets 0xb, 0xc replaced by nops with no change in the semantics -- but as given it adds an extra cycle to the flow of control graph, and hence false complexity.

The release build of the code retains the first two instructions, but then rather than jumping to get and return an empty list, it in-lines that fraction of the code. If we look at it in C#, the release build looks like

FSharpList<...> function(FSharpList<...> alist, ...)
{
if (plist.TailOrNull == null)
{
return FSharpList<...>.Empty;
}
// some computation
view raw gistfile1.cs hosted with ❤ by GitHub

where the debug version has a goto in that if clause that jumps to a statement that returns the empty list.

As an aside, it's also worth noting that nowhere in the .pdb information is there a map to any of the match expressions : there is never a source context which matches the | h :: tail -> or | [] ->; it's all collapsed into the match ... with fragment.

For more complex match expressions such as

match context with
| :? Type1 as t1 -> // type 1
| :? Type2 as t2 -> // type 2
| :? Type3 as t3 -> // type 3
| other -> // something else
view raw gistfile1.fs hosted with ❤ by GitHub

the debug IL organizes into trying each of the branch criteria in turn, then jumping ahead to the matched expression; this ends up with a series of redundant branches like

IL_001f: brfalse.s IL_0023
IL_0021: br.s IL_003f
IL_0023: ldloc.0
view raw gistfile1.cs hosted with ❤ by GitHub

culminating in a final triple burst

IL_0039: brfalse.s IL_003d
IL_003b: br.s IL_0077
IL_003d: br.s IL_0085
IL_003f: ldloc.1
view raw gistfile1.cs hosted with ❤ by GitHub

whereas the release build treats it more as an if..elseif..elseif..else construct.

As I noted yesterday, this means that there can be a factor of at least two difference in the naively computed cyclomatic complexity of a method depending whether the release or debug build is chosen.

No comments :