Tuesday, April 12, 2011

Generalized IEnumerable to chunked IEnumerable in F#

F# sequence expressions, under the covers, are complicated beasts; for the purposes of today's exercise, the factor of interest is that they Dispose() internal enumerators aggressively, unlike in the C# case, so the simple Ratchet wrapper isn't enough by itself. So, with some refactoring, we get

Note that this version closes the stream after the iteration completes -- we could leave the management of the stream entirely outside the iterator if we wished. Since we are re-using our Enumerator instance, we only have the one left for garbage collection, so suppressing the disposal doesn't litter too badly.

Overall, if you're in the simple use-case of pulling chunks out of a file in order, the earlier version of the system will suffice.

Edit 1-May-11 : Comparing with 2#4u's Seq.breakBy I get on my laptop

  • Real: 00:00:03.948, CPU: 00:00:03.946 for Seq.breakByV1
  • Real: 00:00:00.345, CPU: 00:00:00.374 for Seq.breakByV2
  • Real: 00:00:00.910, CPU: 00:00:00.904 for Chunk.Window

making the functional Chunk.Window implementation about 2.4 times slower than the faster -- but with more internal mutability -- of the the two.


Mark Rockmann said...

Two version and still the wrong code... Repeatedly calling Seq.truncate (or Enumerable.Take) will yield always the same elements. Btw, what about Seq.windowed?

Steve Gilham said...

1) What do you think Ratchet and Controlled are doing?

2) If you try that code, you will see that it passes the tests.

3) As I said earlier, "I may have missed these in F#"

Steve Gilham said...

4) Seq.windowed doesn't do what Chunk.Window does. windowed slides a window through the sequence one element at a time (useful for doing moving averages); Window returns the sequence partitioned into disjoint adjacent sections (useful for breaking a file into fixed size pieces to process)

Doing this the TDD way, define input as above and go

let chunks = input |> Seq.windowed 16 |> Seq.toArray

get val it : int = 27 = 42-16+1 and not the value 3 = (42+16-1)/16