When I was at the barbers' the other day, the puzzle section of the newspaper had a word game that involved taking a 9 letter word and using the letters to fill in a degenerate crossword box with 1 Across being 5 letters, and 2 Down also 5 letters, starting at the 3rd letter of 1 Across. So naturally, I thought "I can write a script for that", so I did.
My first take used Set operations because the target word had no duplicate letters; so I thought, "let's generalise this, and tidy up the scruffy looking bits" so I could get all the possible variations.
Which left me with this
which ran like a slow thing in SlowTown -- it took a few minutes to scan 97 words beginning with "ab" and yield 9 words with a unique decomposition.
As my original take hadn't seemed to run anywhere near as sluggishly, I reverted all the Seq.choose
and match
to Array.filter
and Option.isSome
and got an order of magnitude speed-up.
So how much faster could I make it? I tried a lot of things -- everything as a seq, cache the sequence form of the words, adding a look-up table so given one five-letter word, I could look up a precomputed list of ones starting with its third letter, -- but the times all stuck around the 10 second mark for the "ab" words (or got worse, like when I cached the result of taking the five letter word from the 9-letter one).
The only thing that made any positive difference was replacing the immutable structures in the test for whether a word was a subset of another, like
That got about a factor of 3 improvement, and worked through all 10803 9-letter words and 4717 5-letter ones to yield 1942 candidates in about eight and a half minutes.
So I'm baffled about what is making the slow case so slow, but I'm making this note about it here in case I hit it again.
This was all done on my '07 vintage laptop using F#3.0, with the word list I got through Stack Overflow, for those who want to try out timings. A British word list, of course, for a puzzle in a British newspaper.