Distributed Memory: May 2007

Thursday, May 31, 2007

Links for 31-May

Google Gears -- Offline web apps

Pre- and post- advice in Ruby -- more metaprogramming

Let's build a grid -- Thinking about your layout. Some of this is applicable to general UI

Widown't for Rails -- preventing that one dangling word after a line-break

Spawning a process in IronPython

I've been using IronPython recently as a scripting language because its string handling and ability to provide structured programming support is so much richer — and accessibly documented — than old .BAT & .CMD file programming supports. The downside is that the old style files make launching sub-processes trivial. So, in the spirit of DRY (Don't Repeat Yourself), here's the file to import.

And yes, this is really another example of how to use .Net rather than how to use Python.

Wednesday, May 30, 2007

Links for 30-May

Icebreakers -- interview questions designed to put you both at ease and break out of the scripted, stilted talk that results in bad hires and missing good hires (from the same source as the "Design a Monopoly Server" question)

Hygienic Macros -- practice safe meta-programming

Top 300 XSS sites -- Hall of Shame

Accessible expanding and collapsing menu&

Tuesday, May 29, 2007

Links for 29-May

Compiler Lab : Second Day -- more silverlight material from Microsoft

DLR in .Net 3.5 -- and related topics

Virtual Desktop inside the browser -- more silverlight goodies. This looks seriously interesting, and would make things like browser appliances worthwhile.

Sunday, May 27, 2007

Anime — what I'm watching

This season: well MGLN StrikerS should be obvious, even though I don't think spreading the story so thinly is doing what should be a "Gosh, wow!" roller-coaster action series any favours.

From good to so-so of the rest of the current season, Seirei no Moribito, Bokurano, Marimite 3rd season OAVs (see previous review), Claymore (so-so Nihon-meets-D&D), Rocket Girls (harmless fluff), El Cazador de la Bruja (Noir in Mootxico), Moonlight Mile (lumbering cash-in of Planetes).

Catching up on older stuff: Yokohama Kaidashi Kikou OAVs (cute, chill-out), Mahou-shoujotai (teeny witches!), Aria the Animation (more chill-out, with Natural queued), Busou Renkin (OTT Japan-style superheroes), GitS:SAC 2nd Gig (well stalled).

And I really need to get around to Paranoia Agent, Texhnolyze and Narutaru.

Looking to the top spots of the season to date:

Bokurano

The "art" favourite of the season, based on a manga with the motif of a mecha powered by the suffering of damaged children, I'm not sure how well the anime is going to keep up with some of the more raw parts. And as it hasn't ended the story yet, the anime is going to have to pull something out of thin air or just end unresolved.

Seirei no moribito

A fantasy set in some mythic Asia, apparently based off a novel. Not hyperkinetic, just quietly understated, and shaping up well.

Anime — Le Chevalier d'Eon

France, second half of the 18th century. The murder of his sister, Lia, brings d'Eon de Beaumont into a web of intrigue, quicksilver-powered zombies and Illuminated history.

As one of "four musketeers" serving in the King's Secret (the King being Louis XV), he follows revolutionaries to the Russia of Elizabeth and Catherine; and then to England to confront Father Sir Francis Dashwood at Medmenham Abbey, before returning again to France to find the one who killed his sister; and all the time avoiding the plots of the Comte Saint-Germain, the Count of Cagliostro, and the nefarious Maximilien Robespierre, who seemed to be too close to his sister.

Oh, and he keeps being possessed by his sister's spirit, and invoking the magick of the Psalmists, as well as kicking righteous ass in the pursuit of justice and in the name of France.

The history is a bit “:lol: Japan” and uses Psalms as western writers might use Buddhist sutras — but enough buckles get swashed in the expected style for this to be a fun romp. Recommended.

24 episodes, and apparently becoming available in R1 DVD.

Friday, May 25, 2007

FePy r6 + 1 line of code runs PyFit

Following up this trail again --

IPCE-r6 doesn't run PyFit out of the box, but it gets a lot closer than the mainline does:

Traceback (most recent call last):
 File C:\PyFIT-0.8a1\fit\FitServer.py, line 7, in Initialize
 File , line 0, in __import__##4
 File C:\PyFIT-0.8a1\fit\fitnesse\FitServerImplementation.py, line 49, in Initialize
 File , line 0, in __import__##4
 File C:\PyFIT-0.8a1\fit\fit\Fixture.py, line 21, in Initialize
 File , line 0, in __import__##4
 File C:\PyFIT-0.8a1\fit\fit\TypeAdapter.py, line 34, in Initialize
 File , line 0, in __import__##4
 File C:\PyFIT-0.8a1\fit\fit\taBase.py, line 36, in Initialize
 File C:\PyFIT-0.8a1\fit\fit\taBase.py, line 82, in TypeAdapter
AttributeError: 'module' object has no attribute 'ast'

This line is the first to look for compiler.ast members to initialise data members. Looking at the IPCE bundle, C:\IPCE-r6\Lib\compiler\ast.py exists and has the values required. It's just that its __init__.py is empty. , so we can patch C:\IPCE-r6\Lib\compiler\__init__.py to be

import ast

and it all works, without having to touch PyFit code at all.

This is the "no change to client code" equivalent to decorating PyFit with a suitably guarded explicit load

import fepy
fepy.install_option('ast')

The behaviour is not a bug, but a feature as FePy by default does not load code that is rarely used, so as to speed start-up.

Note

We still don't have parsing of lists, tuples, dictionaries or complex numbers as arguments, since we still don't have the transformer.parse method to expose as compiler.parse; the unit tests in TypeAdapterTest.py report

Ran 117 tests in 2.468s

FAILED (failures=3, errors=5)

However real tests that don't have these non-scalar types do run happily on FePy-r6.

Tuesday, May 22, 2007

C++/CLI

Notes on mixed types -- the invaluable gcroot<T> template.

Mixing Native and Managed Types in C++ -- plus AutoPtr generic

AutoPtr revisited.

Silverlight -- more than just Flash

A lot -- almost all -- the talk about the new Silverlight technology from Microsoft has been about the "shiny", the ability to place hevyweight UI in the browser, and whether or not it will succeed in displacing Flash in a way that Java applets never managed to (But see also here).

Somewhat lost in all this has been what the technology is, especially in the 1.1 release, currently in alpha. Of course this isn't new -- remember all the "What is .Net?" thrashing about that took place when that technology was first announced -- all talk of web services and "Hailstorm" (MSFT Passport-on-steroids) -- and what we understand by the term now over 5 years down the line

The basis of the sliverlight technology is a portable version of the CLR (.Net) framework, referred to as the Core CLR. This is the bytecode engine and a slightly cut down set of APIs, though cut down in different ways to the .Net Compact framework -- it relies on APIs not present in that version.

And then comes the good bit.

Layered on top of the CLR is the new DLR (Dynamic Language Runtime), which abstracts the flexible duck-typing behaviour seen in JavaScript, Ruby and Python -- and these (along with the inevitable VB) are the target languages for the DLR support in coming releases. Python and JavaScript (EcmaScript 3.0) are in the 1.1 alpha, with the others to follow. More... Yet more...

The DLR has also appeared in the refactoring of IronPython -- and the recent 2.0 alpha release includes the Microsoft.Scripting.dll that is at the heart of the DLR (though the current transition state is made apparent by there being a Microsoft.Scripting.Vestigial.dll representing the remains of a first cut at refactoring to separate out what is now the DLR).

Two blog posts from one of the DLR developers (who's also been involved in Jython as well as IronPython) on how dynamic objects are being modelled in the CLR

and the start of a series on how compilation is handled (DLR Trees, part 1), which shows how it is complementing the LINQ features of C# 3

With IronPython and, soon, IronRuby as first class languages for the .Net framework there seems to be some hope that the .Net == C# (or, for those who want to join a sort of Coke vs Pepsi style debate, .Net == C# or VB) status quo may actually be coming to an end.

MSBuild -- build from solutions without DevStudio

This is a tool that had passed me by until I spotted a passing reference to it just the other day. Buried in your C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 folder is an executable called MSBuild, which will take .*proj and .sln files and build them.

It consumes XML files in a format of which the .*proj files are a special case; and handles solutions (.sln files) as a special case. While I've not had chance to explore the ramifications of the XML build format, this appears to be Microsoft's own preferred build system and one that will bear investigation and adoption, especially for those projects (like Callisto) which use DevStudio for developer builds and thus have had it installed on their build machines, when they could have gotten by with just the .Net 2.0 Framework.

MSBuild with solutions -- how to capture the temporary script file generated from a solution; and how to actually build just one project from a solution

MSBuild with ASP.Net -- Visual Studio 2005 no longer uses project files for ASP.NET; this is how to roll your own from the solution.

Solution files and MSBuild -- Automated hoisting of project build from the solution file.

These latter look like something that could be ported into an IronPython script with ease.

…and caught up. Phew!

Interviewing

Now I've just finished a round of interviewing, time to let out some of the secrets. It's really about whether the candidate can think on their feet (and can work elegantly around places where they aren't familiar with detail) and knows general software engineering techniques in some depth (according to their length of experience).

Five essential phone-screen questions -- not bad at all

Job Interview 2.0, now with extra riddles -- Is moving Mt Fuji really the way you want to go? Or does it just select for those with an eye to making the solutions more complicated? Me? I'd not work for a company that uses this interview style. The 10/5/2/1 minute problem I know the answer is 1 minute faster than the obvious one using the 1-minute runner, and if I wanted to do it again, I'd write a program to solve it.

"My favorite interview question" -- it's "How might you design a program that lets people play Monopoly with each other over the internet?" Not a question that I personally have tried, but the post analyses what makes this question a good one.

Learning Ruby

HacketyHack -- whytheluckystiff has now launched a "Ruby for kids" site to complement

why's [poignant] Guide to Ruby -- the original mind-expanding course; or for more staid heads

The Pickaxe Book -- 1st edition, on line

OK, so only the first one is news -- and something we've needed for years, since the old days of hands-on Basic on a PC are long past.

Links for 22-May

Hard-core concurrency considerations

JavaScript -- Lisp in 'C's clothing -- used to demonstrate the Y combinator

HTML5 now accepted for review

You think you know JavaScript but you have no idea -- excellent presentations

The high cost of free tools

An Introduction to Ranges (JavaScript/DOM)

Is HTML5 a slippery slope?

Zoomable UIs

Design Patterns Aren't

"Maybe" in Java

Process is about people

Managed Code Custom Actions : no support on the way and here's why --why we still need to do installer custom actions the hard way

Apple's new Invisble UI

Another look at HTML5

Modularity, Ruby and doing the right thing -- cross language behaviour in Silverlight/DLR

C# and the compilation tax -- " Over the last four years, I've basically given up on the idea that .NET is a multiple language runtime." I know how he feels.

Web 2.0 is neglecting good design -- BBC News picks up on Jakob Nielsen's comments

Design by Grid -- grid layout in the post <table> age

A nostalgic look at using XMLHttpRequest with SOAP -- how to call a web service from your browser the "old-fashioned" way (i.e. how to write the code that ASP.NET AJAX does for you).

Silverlight at the command line -- Test your Silverlight applications in the real thing in scripts (the Silverlight engine is highly sandboxed, so it's going to be limited as an application context)

Silverlight : dynamic languages in the browser -- roundup article

Ruby on Rails on TDD -- a 15-step guide to test-driven web apps

JavaScript Libraries : the big picture -- high level, high speed. Also, good stuff in the comments.

Breaking out of the box -- Learn the rules and then break them for grid-based page designs.

LessMSI -- MSI unarchiver

Hackers and Fighters -- or, Theory vs. Practice

IronPython URLs -- aggregator blog

Hello, Dynamic Language Runtime-enabled World! -- IronPython 2.0 now runs on Mono. Shame that 1.2.4 came out about 24 hours before this was checked in.

Identity Providers, Authentication, Self-Issued cards

AD FS/SiteMinder integration via federation

Web Content Accessibility Guidelines 2.0 up for public review

The implications of OpenID

IronPython Console Syntax colouring and tab completion —
tl;dr: ipy -D -X:TabCompletion -X:ColorfulConsole

IronPython Community Edition (FePy) -- at v1.1 with added tighter integration with CPython libraries.

Reducing User Interface Friction

Color Oracle -- colour-blindness simulator

Current Browsers and WCAG1.0

JavaScript -- the Lingua Franca of the web

1023 bit number factorized. Time to move to 2kbits, and retire those decade-old PGP keys, I think.

Form validation with Prototype+Scriptaculous -- an example

Productivity tip -- not all Pentium-4 class processors are born equal.

Phew. Caught up

Monday, May 21, 2007

Links for 21-May

Getting there…

Sins of Software Security

Evolving the Web : HTML5 -- best viewed in Opera (latest public build) which supports some of the proposed new standard

Designing for Web 2.0

JeffCroft.com: Elegant Web Typography (link to PDF)

Why "Why Functional Programming Matters" Matters -- A good thinking piece.

Consuming OpenID in Python (Django) web apps

Rails Live CD distro

amb special form -- non-deterministic (ambiguous) computation

"inherit" behaviour via expression in IE -- still working around the omissions

Designing Interfaces: Patterns for Effective Interaction Design

Patterns in interaction design

7 JavaScript techniques you should be using today

New Recommendations for Using Strings in Microsoft .NET 2.0

Rails style creators in Java

How to prevent HTML tables from becoming too wide

Conway's Game of Life in JavaScript -- utilizing the Canvas HTML extension for procedural graphics supported at least by Firefox 1.5, Safari 1.3, Opera 9.0 and later versions of these browsers. (It will not work on Internet Explorer.)

Barrier-Free Web Design -- The essence of Accessibility

The problem with Configurability -- Usability is sensible defaults, not a dial for everything.

Code is for people to read

Strictness and correctness

An Initiate of the Bayesian Conspiracy

Economizing can be penny wise and pound foolish -- Is your code red, yellow or green?

Contrast and Meaning

Browsers will treat all HTML as HTML 5 -- Interesting conformance constraint for the new version

Programming Tip: Learn a graphics editor -- Always good to broaden your skill-set; especially when you have to fake-up UIs

The 128-bit programming challenge -- a piece of the zeitgeist.

Securing Orcas Workflow Services with& CardSpace

What is Good Software Design?

Illiterate Programming

Learn Prolog Now

Style matters

Programming Quotations

Programming Languages: Application and Interpretation -- e-book

More than just writing code

Can software be just like Lego?

SVG browser support

That catches me up to 9-May-07

Sunday, May 20, 2007

"-Wall" is misleading

So, what happens when we compile this code "-Wall"?

Not the answer you'd expect --

$ gcc -Wall test.c
$ ./a.exe
Whoops!
$

However, there are quite a few warnings in the gcc documentation that are marked as not being controlled by -Wall -- so let's turn on a bunch of them:

$ gcc -Wall -W -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Werror -ansi test.c
test.c: In function `main':
test.c:8: warning: comparison between signed and unsigned
test.c: At top level:
test.c:3: warning: unused parameter 'argc'
test.c:3: warning: unused parameter 'argv'

Ah! That's better.

By contrast

C:\temp\unsign>cl test.c /W2
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
Microsoft (R) Incremental Linker Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

C:\temp\unsign>cl test.c /W3
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
test.c(8) : warning C4018: '<' : signed/unsigned mismatch
Microsoft (R) Incremental Linker Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

C:\temp\unsign>cl test.c /W4


Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
test.c(8) : warning C4018: '<' : signed/unsigned mismatch
test.c(3) : warning C4100: 'argv' : unreferenced formal parameter
test.c(3) : warning C4100: 'argc' : unreferenced formal parameter
Microsoft (R) Incremental Linker Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

The Microsoft compiler takes the unsigned comparison as a level 3 warning, and that's what you get by default from DevStudio, with the unused argument warning at level 4.

So what's going on here?

Well in the comparison, the unsigned int is a "larger" type so a is promoted; but unsigned int is not actually any larger in terms of bits, so the bit value is preserved, and so a is interpreted as 2^32-1. Which is slightly larger than 1.

Some C and Java test and analysis tools

For static analysis of Java, the PMD tool (http://pmd.sourceforge.net/) can provide lint-like (or FxCop-like, depending on your exposure to these things) coverage of your source, with a configurable rule-set. It can generate its reports as part of an Ant build, and be integrated live into most Java IDEs to give live reports.

Like most of these tools, you will curse the first time you expose legacy code to it, and aim to clear the reports. In code that has to also play nice with .Net, you'll have to switch off two rules MethodNamingConventions and LongVariable; and some of its rules e.g. about empty default constructors cannot always be all consistently applied (so may need case-by-case suppression).

To go along with the use of JUnit test, Cobertura (http://cobertura.sourceforge.net/) is a coverage tool that works by instrumenting the Java bytecode of the generated classes. It provides Ant tasks (and thus can be manually inserted into any Ant based build system, such as NetBeans projects) -- instrument the normal output in a post-compile step, and run JUnit against the instrumented code.

This tool gives you line and branch coverage reports like this sample -- all the way down from package-level summaries to line-by line indications.

While most of this post is about Java, I might as well add to the pot a 'C'-based tool, Splint, which, as the name suggests, is an uprated lint-style tool, which includes in its analysis some basic probing for code that might be vulnerable to buffer overrun -- at least in cases when a buffer is passed into a routine without any length, and then gets written to.

I've not managed to track down any good free tools for C++ -- Microsoft's Prefast (at least as of 2 years ago) balked when fed code containing STL headers; other tools are 'C' only, or pay-ware. Given the syntactic complexity of C++, this is not so surprising.

The STL is your friend

Everyone knows this little idiom in 'C'

  char * array = malloc(string_length+1);
  if(!array)
  {
    // do something to handle allocation failure.
  }
  // do the real work
  free(array);

So the C++ equivalent must be

  char * array = new char[string_length+1];
  if(!array)
  {
    // do something to handle allocation failure.
  }
  // do the real work
  delete array;

Actually -- no.

The first bug is that having used new[], this must be matched by a delete[] (otherwise you will land yourself with interesting heap corruption to track down). The second is the behaviour of new when memory is not available.

Unless you have gone through some contortions to set up your build (in which case you know what you are doing), memory allocation is raised as exception std::bad_alloc -- so take 2…

  char * array = NULL;
  try {
      array = new char[string_length+1];
  } catch (std::bad_alloc) {
    // do something to handle allocation failure.
  }
  // do the real work<
  delete [] array;

Better, but you're still doing some of the heavy lifting yourself. Better yet is (assuming that you are working on the allocated space as a mutable buffer)

  std::vector<char> array; // zero length vector on the stack
  try {
     array = std::vector<char>(string_length+1);
  } catch (std::bad_alloc) {
    // do something to handle allocation failure.
  }
  // do the real work

or, if you're actually working with a string that is to be read from

  std::string array; // zero length string on the stack
  try {
     array = std::string(my_nil_terminated_string);
  } catch (std::bad_alloc) {
    // do something to handle allocation failure.
  }
  // do the real work

where the buffer management is handed over to code already written and tested to do the job for you.

Links for 20-May

The flood continues …

Solving FizzBuzz using compiler error messages -- or another example of how to use C++ templates as a functional programming tool

Redefining Professionalism for Software Engineers

JavaScript hijacking -- Interesting post and comment thread on poor idioms to use within AJAX/JSON type programming

Sandboxing JavaScript using <iframe> -- a technique to mitigate injection attacks

Hard-core customisation of Visual Studio 2005 -- here used to suppress the auto-generation of #region tags around automatic interface generation (which are a nice idea in theory, but soon turn into undocumentation in practice). Thinking of which

Avoiding Undocumentation -- an oldie, but a goodie

loglibrarian: How I Learn New Things These Days

UAC is for developers

Don't have a COW, man? What Haskell teaches us about writing Enterprise-scale software -- It's surprising what light a functional language, with immutable values can cast on a problem

Optimizations That Aren't (In a Multithreaded World) -- however... (Damned if you do, damned if you don't...)

The "Yes is No" problem -- with dynamic languages, what you see in the source may not be what you get at run-time

Pick a license, any licence

Breaking 104 bit WEP in less than 60 seconds

CSSVista -- Live editing tool for CSS in Firefox and IE simultaneously

Concurrency Crisis -- Is vanilla OO the right tool for handling inherently concurrent systems?

Less programming -- more skill?

Ruby Threads considered worthless -- mainly because they are interpreter-level (aka green) thread, not ones known to the OS (though that does not apply to JRuby which uses the Java threading library instead.

JavaScript framework for Google Maps

More on HTTP.SYS and its security implications -- a must-read

Typography -- compose to a vertical rhythm

The Language of Accessibility -- build it in, don't build barriers

Prioritizing Web Usabilty -- Jakob Nielsen's latest book

Ruby code that will swallow your soul -- some of the tricks that that Evil Ruby (Extends Ruby's semantics by accessing its internals from pure Ruby code) can enable

Pattern matching with Ruby -- no, not regexes; but something much more fun

SoftCoding -- A dailty WTF essay piece that spots a problem but doesn't quite draw the lesson from it -- the lesson being that Abstraction is somewhat like Optimization.

What if web apps worked like pin-ball machines? -- an interesting view of web idiom

Markup as Craft -- aiming for maintainability and usability

Guidelines for creating better mark-up

InfoCards and Identity Stability

CardSpace and Unique IDs

Agile Project management and Competitive Advantage

From Abstraction to Zipf

What does Barbara Liskov have to say about Equality in Java? -- same reasoning applies in C#

Architect's notes for Varnish -- a different look at storage paradigms

The truth about Lisp

Token Description Service for Cardspace

Walking, talking and quacking in Java -- Duck typing and interfaces

When in doubt, make it public

Mount ISO as CD with a MSFT tool -- self extracting .zip with readme.

[Now playing - Uninstall]

Saturday, May 19, 2007

Anime — Gakuen Utopia Manabi Straight

About the only series that came out in Q1 that I followed -- and with great thanks to Anonymous who actually finished the subs, after it had been seemingly abandoned 3/4 of the way through.

Girls' high school, slice of life in the mid-2030s. Population declines, and schools are emptying. But one girl is not going to let her new school go quietly into that good night.

Cute, harmless. In all, simply nice.

Exceptions and their treatment

Depending on your preferred language, exceptions can vary from being scary black magic and an indication that all has gone horribly wrong ('C', even with Windows Structured Exception Handling), through to being familiar to the point of contempt (Java, C#).

As a rule of thumb, exceptions should indicate that, well, something exceptional has happened -- not that a user has said "No" rather than "Yes" at a particular interaction, but rather that a part of the infrastructure has not lived up to its expected contact (memory cannot be allocated; a network connection cannot be made). There is, even so, a grey area in the middle -- things like "File not found", where the low level code is reacting helplessly to the caller having not lived up to its side of the bargain -- where an exception generalises the concept of an error code.

When should an exception be raised (i.e. when should a throw statement appear in code)? When something happens that the code at this point cannot sensibly deal with because it can't see the big picture.

When should it be caught? Two answers here -- first, when the code reached by the stack unwinding knows enough about what it is doing to be able to respond sensibly to the problem ("This file does not exist, please choose another"); secondly, when the code knows enough about what is going on that it can describe the error in a better fashion to the code higher up -- a process of catch, augment (or wrap), and re-throw.

So, what happens in the middle?

In some cases, what has happened is that the system has gone so horribly wrong that we might as well throw up our hands, and let the process terminate; in others, what has happened is essentially trivial (like file not found) so we really want to be able to pick up and continue.

How do we characterise the behaviour of the system, then, when we perform an operation that may throw?

There is an established terminology for this. We say that an operation has a basic level of exception safety if the remains in a consistent and usable state (no leaks, all objects valid) after the exception. The operation is strongly exception safe if after the exception is handled, the system has returned to the state before the operation began.

And some operations can be classed as not failing (and without them we would indeed be building on sand). Destructor/deallocator and assignment/swap operations must fall into this class -- without them we cannot reliably or safely perform any clean-up or recovery.

RAII (or using, or try/finally) is the idiom that (with no-fail deallocation) most cleanly supports the two weaker types of behaviour : this is where we can prevent resources leaking, and restore objects to a consistent state (if not always the original one).

The easiest way of ensuring strong safety, i.e. a roll-back to the original state is to do the all the operations with temporary variables. Only when all the tricky work has been done, do we mutate the external state, using (no-fail assignment or swap operations) -- any exception that could happen, happens before the objects are changed from their original pristine state, and roll-back is a no-op.

Working to improve the resilience of code under exceptions is another good reason to strive to separate responsibilities within the code -- a routine that has many effects, especially if some are dependent on others, can be more difficult to separate out into fallible and safe sections. As always, this is an ideal to be striven for, rather than an absolute -- consider the operation of popping a stack in C++, where the stack is mutated and an object (rather than an object reference) is copied.

Responsibility Driven Web Design

It has been suggested as a rule of thumb that debugging and maintenance is twice as hard as writing the code in the first place -- so writing anything that is truly as clever as you can manage guarantees that you'll not be able to fix it.

Taking a responsibility led approach means that each unit that you write has a clear reason to exist, and does a well defined job. Tweaking its "mission statement" or making it adhere to it is a task that only affects the system locally -- reducing the fears that some well intentioned change will have some subtle side-effect in a distant part of the system.

But responsibility driven design is not just about organizing your objects (as I posted about here) in a conventional code library or executable. It can be a general philosophy of design.

But web design?

Yes. The web -- and in particular, the vast bulk of user agents (browsers) out there -- has matured greatly in recent years, and support for techniques that let us apply a separation of responsibilities approach is now almost universal.

When HTML over HTTP first emerged, it was aimed to fill a rather restricted role -- publishing scientific papers on-line, doing away with the need to mail-shot out Xeroxed pre-prints; and it filled that niche well. But then it burst out from the academic to the commercial (and hobbyist) fields. No longer were sketchily formatted grey pages with various sizes of Times Roman acceptable -- the guys from the DTP world came on-line and tried to force what had been a set of hints about layout into a pixel-perfect replica of the paper-oriented world they were familiar with.

And around the turn of the century, there was only one way to do that -- to abuse the <table> element to force a grid model into what was by its nature a fluid medium. Tables within tables within tables, propped up by 1x1 transparent spacer gifs, borders supplied by setting the background in some of the cells, became the standard -- and to be fair, with the version 3 and 4 browsers, at the height of the browser wars (remember <blink> and <marquee>? -- things seen that cannot be unseen), that was the best that could be done.

So, it worked -- after a fashion. But, to paraphrase "d00d -- where's my content?"

Updating the content of a site generated in this style was painful -- and rebranding became an exercise in starting again from scratch.

This is where separation of responsibilities comes in.

Content and how it is structured -- the semantic value of the page -- can be separated from its presentation; and the behaviour of the page separated from both of them, by using the appropriate technologies for each; respectively HTML (or XHTML) for the content, CSS for the presentation, and (unobtrusive) JavaScript for the behaviour.

Separating the presentation out into CSS gives two immediate benefits -- first, it makes you think about how the meat of your page is structured; and second, it means that you are not forcing a 600-1000 pixel wide design onto the users of small form factor devices, and can see what users of text-only browser (including text-to-speech) users will see, gaining an immediate improvement in accessibility.

To assist with moving to the non-presentational page, writing and validating pages against XHTML 1.1 is good since it has almost none of presentational tags and attributes found in HTML 4.01 (though you should be aware that there are issues regarding the RFC-compliant correct MIME-type for serving this dialect -- cutting a long story short, you should theoretically only serve it as XML, but IE can't handle that).

These days, design can be targeted at browsers with good CSS support -- Mozilla browsers from ~2003 up, Opera 8+, Safari and IE7. Then you can fix up for IE5-6 by using IE's documented conditional comment mechanism to feed extra CSS and/or Dean Edwards' IE7 scripts to adjust to fit. Earlier browsers -- far less than 1% of the web these days -- can be protected from all this by their lack of CSS support, and get the text-only version (perhaps with colour and font styling).

The joys of RAII

A topic which, as Michael Caine might put it, "Not a lot of people know that". Alas.

One idiom which is almost unique to C++ amongst the languages in common use (for practical purposes this is defined as 'C' and its descendants -- C++, Java, C#; though it equally applies to Ruby or Python) is the concept of Resource Aquisition Is Initialisation (RAII). That is to say, when a resource -- whatever its type -- is acquired, it should be part of the initialisation of an object. Then, using the stack-based scope for variables, we can make the corresponding release operation happen naturally on exiting that scope, even during the stack unwinding following an exception*.

Managed languages (all those named above other than 'C') have separated out heap memory as a special (albeit frequent) case of resource allocation, handed the problem of tidying up to a garbage collector running inside the VM on which the code executes, and makes the deallocation of heap objects a non-deterministic process. For other types of resource, there are special idioms -- like IDisposable and the using construct in C# (or, if all else fails, try/finally).

In C++ all resources are treated equally, and all have deterministic points where clean-up can happen -- at the end of a {} delimited scope.

Need to release memory -- allocate it as a std::vector<> (for arrays) or std::auto_ptr<> (for a singly owned single object) or similar smart pointer in the appropriate scope
Need to release a COM object -- wrap it as a CComPtr<class T> template (ATL has a lot of these useful features kicking around for general Win32 native C++ programming). (Just be careful to release COM itself at the end of an enclosing scope, as destructors fire in arbitrary order).
Need to release some other resource (HANDLE, HINTERNET, GDI Object...) -- find (or, if you have to, write) another class to contain the resource and whose destructor frees it.

The classic is, of course, the MFC CWaitCursor -- create one at the start of a long-running block of code, and the cursor will show an hourglass over your application main window, until it is restored in the destructor.

If you are in a position of having to write native C++, then rather than considering it the equivalent of working in the Dark Ages of manual memory management, look to the power that the language offers you to solve the more general resource management problem at a single point.

*Related to this is the reason why destructors must never throw -- throwing during a stack unwinding leads to program termination, no saving throw; only the chance to to some last ditch tidying up.

Responsibility Driven Design

The classical approach to object based analysis and design is the process of taking your problem or solution statement and picking out the nouns (objects) and verbs (methods). The objects that emerge are considered as data, with a bunch of associated methods to operate on those data members.

As a method of organising your problem into code this works; but it can make the code more cumbersome that it could be. A common point at which this sort of problem arises is the point where two or more objects interact through a given verb, and it's not obvious where to put the verb. And then when it comes to implement, side effects or necessary pre-conditions for doing whatever verb have to be considered.

And finally, when we are coding a verb -- an algorithm -- associated with one object, we are working at a level below the object design, where things start to become purely procedural again -- this can lead to code (which I am sure that we have all seen examples of) such as :

class ObjectUser {
    void doSomethingWithData() { // return and argument lists immaterial
        // do some stuff
        bool switch = someObject->getSomeState();
        if(switch)
            // do something not involving this
        else
            // do something else also not involving this
        // do some more
    }
}

where we inspect the state of one object to do something without making any reference to the calling object's state.

This sort of code, once it exists, is amenable to refactoring, moving the if/else code into a new method on the class to which someObject belongs. You may even find that after doing this, the whole need for the public getSomeState() method may vanish. Even if someObject is of a class we don't own, specialising or wrapping it can still be a good move. We have a whole book (and web-site) on how to do that with code.

But, wouldn't it be nice to get the same effect ahead of time, during the analysis and design stages?

Responsibility driven design is one technique that can help with this. Rather than the historic data+algorithm approach for looking at objects, the approach is to look at objects for their roles within the system and their responsibilities in making it work. It also helps us focus not on the static anatomy of the system (such as inheritance paths), but rather on the dynamic behaviour of the objects in their context, and emphasises the encapsulation part of object design.

In this approach an object is defined by what it knows, and what services it performs. Those services may be performed by passing parts of the tasks to other objects that this first object knows about; but the clients need not know about that. All they need to see is an interface, describing the services.

Thinking about or objects this way, we can take a rough partition of the system into objects -- perhaps a noun and verb model -- and start to assign roles : does this object know things? coordinate activities? perform services? interface with things beyond the system? Then we can perform what is in effect a dry run through the model, based upon known use cases. This will show up those cases where an object may actually need to know things not initially available to it. It will also reveal where objects are being asked to do too many different things (and occasionally, those doing too little to justify themselves).

Objects that have more than one focus of responsibility are candidates for being broken up. Tasks that are not strictly related to their responsibility should be moved to objects for whom they are better suited.

After a couple of iterations -- which may be done on the fly, while running through the use cases -- the changes should settle down; and the web of interactions simplified.

With your objects defined by their roles (interfaces), and the unnecessary interactions untangled, the design is also readied for test driven development -- you have interfaces for mock objects to follow, and have reduced the amount of work that a mock will have to perform.

AD FS + Windows Integrated Authentication = Trap for the unwary

There is a quirk in the use of AD FS in its default intranet mode that may come as a surprise to the unwary user.

By default, AD FS sets up the Federation Server to take Windows Integrated authentication; out of the box, it also installs client certificate authentication, but you have to actively enable that by editing the root logon page at /adfs/ls/clientlogon.aspx

<%@ Page language="c#" AutoEventWireup="false" ValidateRequest="false" %>
<%@ OutputCache Location="None" %>
<% Context.Response.Redirect("auth/integrated/"+Context.Request.Url.Query); %>

to do something other than redirect to the Integrated authentication page.

Now, the interesting thing about Windows Integrated authentication is that having authenticated explictly, your browser is already doing single sign-on (SSO) on your behalf every time you touch an authenticated URL.

So, having authenticated once to the Federation Service, you are henceforth silently re-authenticated for the rest of the browser session.

"What has this got to do with AD FS?" you may be asking.

Well, AD FS is not only an SSO mechanism; for applications spread across many hosts, it also offers a single sign-off capability.

So let's talk our way through what goes on when all of these are happening at once

You go to an AD FS protected application
It redirects you to the Federation Server
That does integrated authentication, popping up a dialog box at the browser
Having logged on, you're redirected back to the application
Later, you log off that application -- AD FS directs you back to the Federation Server
The Federation Server sends you a page that removes its session cookies, and contains images whose URLs are part of AD FS on the application, so they can remove the session cookies
Now you go back to the application page, and--
AD FS redirects you to the Federation server which--
Silently reauthenticates you and sends you back to the application

Step 9 may come as a surprise to the unwary -- unless you have some obvious indication in the web application (such as a last sign-on time), the log-off appears not to have happened.

This is not a bug -- it is an inevitable consequence of an old SSO model with no sign-out capability being used to bootstrap the new federation system; and will happen with any silent re-authentication scheme (such as the alternative authentication mechanism provided in the install, which uses client certificates).

When re-authentication is not silent -- explicit form driven authentication at a Federation Service Proxy, for example -- step 9 is blatantly obvious. This may lull the user into complacency.

Links for 19-May

More catch-up…

JavaScript -- the Big Divide -- a presentation

Remove Duplicate Rows From A Text File Using Powershell

Code Access Security and Bitfrost -- can Code Access security be made usable enough to work?

Powershell on Rails

Six cool things you can build with OpenID

Learn C# 3.0 - the easy way

SQL Server 2005 Security Best Practices

Punching holes into HTTP.SYS

Top 6 List of Programming Top 10 Lists

Our Dirty Little Secret

Wonder of the When-Be-Splat -- very nice compact Ruby idiom

Higher-order Messaging -- a the next logical step after higher-order functions

Cross-browser scripting with importNode() -- abstracting away browser dependencies for AJAX style applications

Yet Another JavaScript Library Without Documentation™ -- From the man who brought us the ie7scripts for CSS in IE5 & 6 -- a patch for current browsers subtly different DOM implementations.

Decrypting CardSpace Tokens in partial trust

Reuse is not Usable

Going Commando -- Put down that mouse

Software Development as a Collaborative Game -- remembering the fun

Software Projects as Rock Climbing -- that sort of collaborative game

Machine tags and ISBNs -- interesting application of folksonomy

Friday, May 18, 2007

Lightweight Unit Testing for C, C++ and C++/CLR

Probably the simplest tool for this purpose is MiniCppUnit -- no install needed, 2 source files, ~ 500 lines and all in cross platform C++. This tool was in fact generated as a response to quite how heavyweight the more familiar CppUnit is.

The portability of the framework is extremely useful when code is being developed for a non-Windows platform, but you would like the platform independent body of the code in a unified Windows build for monitoring, as well as on the other platform. The one downside as it comes out the box, though, is that it doesn't build under C++/CLR due to its use of native C++ exception behaviour (i.e. uses the fact that in C++ you can throw anything). This wrinkle is simply fixed, however.

Guarded by an appropriate #ifdef/#else/#endif, do the following

in the .hxx file,

incoporate the .Net framework with

#using <mscorlib.dll>

provide an alternative definition of the TestFailedException class:-

public ref class TestFailedException : public System::ApplicationException

and catch System::Exception^ rather than TestFailedException&

in the .cxx file

throw gcnew TestFailedException();

rather than just TestFailedException(); when building as C++/CLI .

Links for 18-May

A bit of a catch-up:

CardSpace and decrypting Tokens

An Approach to Composing Domain-Specific Languages in Ruby

Giles Bowkett: The Business Case For Firefox

Bitwise Magazine:: What’s Wrong With Ruby?

New mailing list: HTML 5 Help

swfIR: swf Image Replacement

Five Principles to Design By

Debugging a Service on Windows Vista

The One Important Factor of programming languages

Lazylist implementation for Ruby

What Colour do you like your Objects? Pink or Blue?

Actions, Not Words

Are Web Interfaces "Good Enough"?

Your Code: OOP or POO?

Creating User Friendly 404 Pages

Graphic designers misunderstanding Web standards

XSS bestiary

Remove empty lines from a file using Powershell.

Why <video>?

W3C's new HTML blog

Web Typography Sucks -- presentation and links to web typography resources.

Alpha release of Adobe's Apollo cross-platform runtime.

Keep your cookies straight when using ADFS

Is Your Software Team Sticky? -- do they share a coherent vision?

Primary Keys: IDs versus GUIDs

IE 7 does not resize text sized in pixels -- or any other absolute size *sigh* : Squelching what might become an urban myth

The hidden delights of Unit testing

One of the things that has left this blog light of content of late is having been provided with internal blogging, wherein I try to enlighten my colleagues. Often the entries on that blog are collations of links to and through other blogs I read. Some are from experience. Like this one.

One part of the current development project has brought home to me quite how much how serious unit testing results in cleaner code -- and that the closer you strive for 100% coverage in the testing, the more incentive there is to write that clean code.

The closer to 100% you strive to get, especially with a coverage tool (such as gcov) that does branch coverage rather than just line coverage, the more it squeezes your code. At the most brutal, the more code you have, that means the more tests there are to write to cover it all -- the incentive is there to make the code tighter, just to reduce the amount of work to do for completion.

Much of the 'C' code being written contains routines that are explicitly each a little state machine. As such the structure of a routine is along the lines of

check preconditions
determine "state"
"switch" on the current state
tidy
return outcome

Some of the precondition checks are assert() but others cannot be (so will include an exit-on-failure); and the switch may not be a simple flat one -- some cases may have sub-cases; and some might overlap, in a structure like

where a and b are independent.

However, even if you're not explicitly thinking of the code as a state machine as such, the routine structure is still quite generic.

Coverage testing

A set of unit tests can make sure that expected inputs map to expected outputs, both positive and negative; coverage helps tell you if you have "enough" tests. It answers the questions "Has all that code been exercised, yet?" (if not the related "if it doesn't get used, why did you write it in the first place?").

The first thing that a set of obvious positive tests will show are the bits that are difficult to reach. The obvious one is handling exception states -- and here automation and good mocks in the test framework, or your own wrapper to it, are essential. After all, exceptions are meant to be, well, exceptional, but here need to be generated on demand.

With those out of the way, the real difficult-to-reach corner-cases of the logic stand out -- and with only those to concentrate on, you're either faced with writing a lot of tests to reach them; or figuring a way to simplify the code so you don't have to.

It is often tempting to write code like this:

but, damn it, if arranging the case a and c is hard work, you don't want to go through the slog with b as well. Factoring out the special case goes from being something you could do, if you had the enthusiasm, to something you want to do, because it's less work than writing the extra tests. "Don't Repeat Yourself" becomes positively encouraged.

Coverage types

The code metric you use is important in how much benefit you can derive. For a first pass, NCover isn't too bad. But it only counts line visits, being as it is an instance of the profiling API for .Net. In particular if you have code like--

NCover will never show you that you're missing the case of zero and negative values of a. One of the up-sides of working in 'C' on a *nix platform is that that has meant that gcov is available. And that will take code like--

and distinguish between whether a or b triggered the do something -- 100% in NCover usually isn't more than 90-odd% in gcov

Squeezing out the logic

Here's a real example of code improvement in making the last step to 100% branch coverage

I had 100% in NCover; but gcov reminds me that I don't cover all the bases -- because have_token and need_token aren't independent variables : if you don't need the token, you should never have one. So, what to do when aiming for the 100% mark?

The routine here started in a state where I just enumerated all possible cases (there are more than just these), handling them individually in some sort of logical order. Now, the unit tests I already have provide me a framework to check that the code is still doing what I mean it to do when I refactor; so I can look at the code and see that what I have is actually of the form

or, more simply

refactor, re-run the tests and see that the simpler code is still right.

Similarly code guarded by an if clause, where the else is never executed under any input you can generate, perhaps because the assert() defined contract of the method or its callers enforces the constraint, can be simplified to and assert() of the condition and an unconditional block. And you get that better code because you've made yourself go the last little bit.

In the case above, user input could have, but not need, the cookie; we can't assert -- but we don't need to write (though we can) another test case to prove that is harmless, because that is just another flavour of the "else".

Thursday, May 17, 2007

Decisions, decisions

When I'm doing stuff for my own interest and enjoyment, it's almost certainly client-side applications. And unless it's something that is very definitely targeted at a particular set of OS specific functionality, I like it to be cross-platform and easy for a naive end-user to run. This was a particular pull of Java — though it was soon shown to be very definitely “debug everywhere” when the two main browser JVMs (Microsoft and Netscape) interpreted the thumb-width of a scrollbar differently (inclusive or exclusive of the range — or “lol, box-model”). And the API, even for AWT, was far nicer than any of raw Win32/Win16, MFC or OWL, back in the day.

So then the progression went AWT to GWT (a light-weight widget set that had all the missing widgets from AWT) to Swing… and there it stayed. But that did mean programming in Java, which, over the years, I've come to find lives at the wrong level of abstraction : it doesn't afford the ruthless power that C++ does (templates, multiple inheritance), but without giving very much in return (generics are latecomers, and comparatively weaksauce; there's nothing like mix-ins; and everywhere a lot of the same scaffolding that C++ would require).

Growing dissatisfaction pushed me to look at native (as opposed to bytecode) toolkits, where I could get to use C++, eventually settling on FOX, over wxWidgets. The downside of FOX is that it was not built for localisation (too many strings baked in), and the event handling model seemed to fight the language compared with Java's X-style callback registration (wxWidgets lost on having what felt like a more opaque layout model). But than meant compiling on Windows and Linux, and having to tweak platform dependent system header files (and hoping that other *nix-like platforms would work).

The moving target that is C# provided a bit of a distraction — I'll learn it for work, but are all new languages going to be just warmed-over Java? It didn't matter that Mono was providing portable CLR GUI by stages, when the language (C#) had little appeal in and of itself. So it wasn't until stumbling into Ruby and Python in the last few months that I discovered some more shiny in terms of language.

But what to do for client-side applications? Python with wxWidgets? Ruby with FOX? and what about naive users? Jython 2.2 in stand-alone mode? Pity about JRuby's sprawling nature. But what about AllInOneRuby and its friends? I had just about come down on standalone Jython, when the whole silverlight business blew up.

So now I think I know my GUI toolkit. And while we only have IronPython out (and ported to Mono) at the moment, that both Python and Ruby should soon co-exist in the DLR means that it should be possible to do what works best for the application at hand. When an object is an object is an object, much fun should be available.

IronPython and PyFit

Problem — PyFit uses the parser module.

Workround —

remove redundant import compiler from fit\TypeAdapter.py
remove import compiler from fit\taBase.py
remove seq_types, map_types, oper_types from fit\taBase.py
remove _safeAssemble from fit\taBase.py
remove _safeEval from fit\taBase.py and lose the ability to input non-scalar types (string being scalar); or replace its body with return eval(s) and lose the protection against wild input (sorta-OK if your FitNesse wiki is on an internal network).

Either approach breaks various of the PyFit unit tests, but does permit you to run IronPython 1.x directly, so long as you keep your cell types under control.

Special case - IronPython 2.0alpha1

This currently has problems with the type() function used in CellHandlers.isValid(). Unroll the in test as follows

which passes the unit tests.

I believe that the type() problem is known in IronPython 2.0α1, but for the record, a minimal test case, in case it persists into beta

yields

C:\Documents and Settings\Steve\My Documents\code\python>\IronPython-2.0A1\ipy.exe ipy2.py
Traceback (most recent call last):
  File ipy2.py, line 5, in Initialize
  File , line 0, in _stub_##14
  File ipy2.py, line 3, in inspect
  File , line 0, in ContainsValueWrapper##18
  File , line 0, in _stub_##19
SystemError: Object reference not set to an instance of an object.

More IronPython

Since my earlier post on the topic, I find that I've been picked up by the nifty aggregator site IronPython URL's.

At that point, I'd just started using the implementation as part of the build process for the project I'm working on at the moment — while the bulk of the choreography is handled by running devenv against the main solution, the interstitial work is done through the pre- and post-build events. In most cases, .BAT-file behaviour is enough to just invoke an executable or two with a simple command line, but anything with string handling or iteration now gets pushed into IronPython.

Why IronPython in the build? Well, given that we're building with DevStudio 2005, .Net 2.0 is already installed, so IronPython's xcopy-style install means that we can put into a project's tools folder without having to perform any change-of-enviroment on the build machines (a big no-no).

What sort of tasks are we doing?

Quite a variety —

Doing everything that needs version stamping from creating AssemblyInfo.cs files from templates, through web.config references to the generated assemblies to the installers and merge modules.
Doing pretty much everything in managing installer creation from driving Wix to create the initial single language installers, version stamping them, and then extracting the language transforms and embedding them into the master installer

From that base, I've now moved into using IronPython as a tool for helping our system test team put scripts together. Unfortunately even 2.0Alpha1 doesn't have an implementation of parser, so it can't be directly used with FitNesse and PyFit; but even so, it still makes a great utility for CPython scripts to invoke to get at awkward or tedious bits of the Win32 APIs like

ACLs on files and registry keys
the current thread identity
Assembly full-names

or even less esoteric things like DLL versions (even though pulling those out of PE format directly isn't that difficult) -- and without needing to check that all the necessary CPython Win32 add-on libraries are present.

It has also proved useful for quickly putting together scripts to drive web applications, by giving simple script-level access to all the System.Net and System.Xml facilities.

tl;dr — IronPython for the win for its xcopy-style install and out-of-the-box access to Windows APIs.

Film — El Topo

Having heard about it first many years ago in the pages of A&E, I finally got a chance to see Jodorowsky's enigmatic film of the Weird West.

Starting off as what seems to be an ultra-violent Clint Eastwood style spaghetti western, it turns into Zen quest, a tale of redemption, and then starts the cycle off all over again.

My main thoughts were to wryly observe where sensibilities have shifted in the last 35 years; and that I'm not sure what the two women with male voices were meant to represent.

Subscribe to: Posts ( Atom )