Friday, April 15, 2011

Applying IEnumerable to chunked IEnumerable in F# -- Cipher Feedback mode

Having developed the supporting infrastructure to pull data from any IEnumerable into a sequence fixed length buckets, we can now apply it to some practical use. One such is to perform encryption of an input byte sequence in a block-cipher feeback (CFB) mode.

CFB mode is so simple, it almost feels like cheating. Given an initialization vector (IV) -- an arbitrary block of bytes -- as the 0-th ciphertext block, the iterative definition is that the n-th ciphertext block is the encrypted value of the n-1-th block XOR'd with the n-th plaintext block; and by the symmetry of XOR, the n-th plaintext block is recovered from the n-th ciphertext block by the same XOR.

So, let's implement it, and do a simple test that we can easily work out the cipherstream for -- and doesn't actually involve real encryption. So let's take 128-bit blocks, all zeros for the IV, and the identity transformation for our "encryption", and have the integers 0 to 41 as input.

The first ciphertext block will be all zeroes (IV encrypted) XOR'd with 0-15, i.e. 0-15. That XORs with 16-31 to give a second block of all 16s; the final block is 32-41 XORd with 16s, i.e. 48-57.

So, armed with expectations we can write the implementation with appropriate self-test code, thus --

namespace Tinesware.Enumerables
open System
open System.Collections.Generic
open System.IO
open System.Linq
open Should.Fluent
// An internal type to wrap an IEnumerator<'a>
// and disallow arbitrary disposal.
type private Controlled<'a>(iterator: IEnumerator<'a>) =
interface IEnumerator<'a> with
member self.Current = iterator.Current
interface System.Collections.IEnumerator with
member self.Current = iterator.Current :> Object
member self.Reset() = iterator.Reset()
member self.MoveNext() = iterator.MoveNext()
interface IDisposable with
member self.Dispose() = () // The whole purpose of the type -- prevent premature disposal
// Changes the semantics of any sequence to be streamlike
// There is just the one iterator which marches forwards
// inexorably, whatever the underlying sequence
type StreamingSequence<'a>(source: IEnumerable<'a>) =
let enumerator = source.GetEnumerator()
let iterator = new Controlled<'a>(enumerator) :> IEnumerator<'a>
member self.Release() = enumerator.Dispose()
interface IEnumerable<'a> with
member self.GetEnumerator() = iterator
interface System.Collections.IEnumerable with
member self.GetEnumerator() = iterator :> System.Collections.IEnumerator
module Chunk =
/// <summary>
/// Split an enumerable into an enumeration of arrays
/// </summary>
/// <typeparam name="T">Element type of the array</typeparam>
/// <param name="source">The IEnumerable to operate on</param>
/// <param name="chunk">The window size</param>
/// <returns>The enumeration</returns>
let Window (chunk : int) (source : 'a seq) =
let rec innerWindowing (x :StreamingSequence<'a>) =
seq {
let result = x |> Seq.truncate chunk |> Seq.toArray
if result.Length > 0 then
yield result
yield! innerWindowing x
else
x.Release()
}
// StreamingSequence makes the truncate/skip behaviour
// predictable, whatever the semantics of the
// underlying sequence provider
innerWindowing <| new StreamingSequence<'a>(source)
// Exclusive-or two byte sequences to the
// length of the shorter; returns an array
let (^) (a: byte seq) (b: byte seq) =
Seq.zip a b
|> Seq.map (fun x -> (fst x ^^^ snd x))
|> Seq.toArray
// Produce a function that performs cfb-mode enciphering
// given the block cipher forward encryption and an IV
// Alas the type system doesn't permit us to say that all
// these byte arrays must be the same size -- that would
// take C++, I fear
let CfbEncrypt (encrypt : byte[] -> byte[]) (iv : byte[]) =
let cn_1 = ref iv
let operation (plaintext : byte[]) =
let ecn_1 = encrypt (!cn_1)
let cn = ecn_1 ^ plaintext
cn_1 := cn
cn
operation
/// <summary>
/// A Self-test program for CFB mode
/// </summary>
[<EntryPoint>]
let main a =
let input = seq { 0 .. 41 }
|> Seq.map byte
|> Seq.toArray
let iv = seq { 1 .. 16 }
|> Seq.map (fun x -> 0uy)
|> Seq.toArray
let cfb = CfbEncrypt id iv
let output = input
|> Window 16
|> Seq.map cfb
|> Seq.collect id // unWindow
|> Seq.toArray
let expected = [| 0uy; 1uy; 2uy; 3uy; 4uy; 5uy; 6uy; 7uy;
8uy; 9uy; 10uy; 11uy; 12uy; 13uy; 14uy; 15uy; // zeros xor 0-15
16uy; 16uy; 16uy; 16uy; 16uy; 16uy; 16uy; 16uy;
16uy; 16uy; 16uy; 16uy; 16uy; 16uy; 16uy; 16uy; // 0-15 xor 16-31
48uy; 49uy; 50uy; 51uy; 52uy; 53uy; 54uy; 55uy;
56uy; 57uy; |] //16s xor 32-41
output.Length.Should().Equal(input.Length) |> ignore
let matching = Seq.forall2 (fun x y -> x = y) expected output
matching.Should().Be.True |> ignore
// new instance to reset the IV
let cfb = CfbEncrypt id iv
let decrypt = output
|> Window 16
|> Seq.map cfb
|> Seq.collect id // unWindow
|> Seq.toArray
decrypt.Length.Should().Equal(input.Length) |> ignore
let matching = Seq.forall2 (fun x y -> x = y) input decrypt
matching.Should().Be.True |> ignore
0
view raw gistfile1.fs hosted with ❤ by GitHub

where I've renamed the previous private Ratchet class to the public StreamingSequence type -- a more self-explanatory name for possible re-users; and exposed the Dispose method of the constant (reference, mutable internal state) IEnumerator for end-of-enumeration use.

Here, Seq.collect is being used to flatten the sequence after the block-level operation has been done.

CFB mode is so simple that -- having already done the hard work of breaking any sequence into blocks -- the only thing that stopped the test running through first time was forgetting to re-initialize the CFB operation for the decryption pass.


Actually, the decryption operation shouldn't be working -- but somehow it does, even when I push in random input, and replace the identity encryption operation with a one-way hash! Something weird is going on here.

3 comments :

Carsten said...

Hi,
just some short comments.

If you have something like an uncurry operator (see for example http://fssnip.net/3n) you can make (^) much easier by replacing "(fun x -> (fst x ^^^ snd x))" with "(uncurry (^^^)".

I think it might be Fsharper ;) if you would use tail-recursion instead of ref-cells in CfbEncrypt.

Steve Gilham said...

The cfb operation could be cast as tail-recursive byte[] -> byte[]; but at the cost of accumulating all the input in memory and having to reverse it (or doing O(n^2) appends-to-end) within the function call.

Hiding state via closure inside the function returned from CfbEncrypt -- making it no longer a pure function -- preserves the pipeline behaviour : you can pull one block at a time through the process, requiring constant space whatever the sequence length.

Steve Gilham said...

That should be

"tail-recursive byte[] seq -> byte[] seq"

of course.