Tuesday, April 12, 2011

Generalized IEnumerable to chunked IEnumerable

As noted in the comments to the previous post, if you're not using the Window on an Enumerable that maintains its own internal state (so that the different IEnumerator yielded up each time will always be at the start of the iteration), it will loop forever. So if you're not using the two in conjunction for the purpose of "read a file in chunks for e.g. passing over a network" or similar, you want this variant:

where the Ratchet type ensures that the same IEnumerator instance is yielded up each time -- at this point you have to be rather Jesuitical about what is immutable anyway, the sequence that returns you different mutable objects or the one that returns you the same object you may already have mutated...

F# is less convenient for this one, though :(

3 comments :

bill seddon said...

Steve, thanks for the example. I've come across some behaviour I don't understand and wonder if you've any insight.

Imagine you want to skip the first few bytes of an array and chunk the rest. It's natural to .Skip(). However then the Window mechanism will only ever return one window - the first.

A solution is to .Skip(n).ToArray() but this will create yet another copy of the array.

I have been able to work around the issue by adding a parameter to the Window() extension method which represents the number of elements to skip. This value is used to advance the iterator stored in the ratchet by calling it's MoveNext method. It works and I understand why it works. But I don't understand why calling .Skip() causes a problem. Skip() returns a SkipIterator instance rather than the input array iterator. This must be part of the problem but why?

Steve Gilham said...

It looks like the SKipIterator is showing the same behaviour as I observed with F# sequence expressions -- the iteration is being Dispose()d and rendered non-functional after the first chunk is extracted.

The workround is to add an extra helper class that implements IEnumerator<T> and forwards all the calls except that to Dispose() to the real IEnumerator<T>, and return an instance of that class from the GetEnumerator() method of Ratchet -- equivalent to the F# Controlled type in the next post.

bill seddon said...

Thanks for the suggestion: it was exactly the problem. With a wrapper in place, it's then possible to watch the enumerator's Dispose() method being called after each .Take() (or .ToArray()).