Friday, May 26, 2006

When 100% isn't really 100% - updated!

The latest Journal of Usability Studies (Issue 3, Vol 1) includes an article by James Lewis and Jeff Sauro of IBM and Oracle, respectively, entitled "When 100% isn't really 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates". The article - which is very clearly written, and provides nice 'take-aways' for the non-mathematical - provides a very neat look at alternate ways of estimating task completion rates from small-sample usability tests.

The basic idea of the article is that usability tests:
  1. Typically involve small participant numbers
  2. Report task completion rates as a primary success measure
  3. Typically calculate task completion rates as the number of successes / number of attempts (x/n)
When faced with extremes - e.g. 0% or 100% - we are faced with the difficult choice of producing an unlikely estimate - complete success or complete failure. Since, from experience, we know this is generally not the case, what alternative methods have we for estimating the likely rate of task completion.

The authors compare a number of different estimation methods - Laplace, Wilson, Jeffrey, MLE (x/n) and one of their own construction - 'Split-difference' - and recommend a particular alternative to the x/n method for various sample sizes and MLE value.

This article is well worth a read and should provide you with some extra depth to your analytical toolkit.

----------------------------------------------------------------
As a follow on from this, let's assume you've run a usability test and have the following:
Task 1: 4/6 successful completions = 66.67% success
Task 2: 4/5 successful completions = 80% success
Task 3: 6/8 successful completions = 75% success
[Note: typically, yes, each task would have the same number of users, but I'm making this up, so I can say what I want.]

The journal article tells us that in reality we can say the following:

Task 1: the real completion rate at launch should be somewhere between 21% and 99.3%, but we expect it to be around 60%;
Task 2: the real completion rate at launch should be somewhere between 25.7% and 100%, but we expect it to be around 67%;
Task 3: the real completion rate at launch should be somewhere between 34.3% and 99.5%, but we expect it to be around 67%; and
we can be only 95% certain that even those ranges will be accurate.

Kind of depressing really, isn't it.

Note: if you use around 30 test subjects instead, and maintain the same success ratios for each task, then you could expect the following:

Task 1: 47.7% - 81.9% with an expected success ratio of 64.8% (30 users)
Task 2: 61.44% - 91.75% with an expected success ratio of 76.6% (30 users)
Task 3: 56.82% - 87.82% with an expected success ratio of 72.32% (32 users)

So you can get a much narrower range for your estimate, but 30+ users is a significant undertaking for a usability test.

Tuesday, May 23, 2006

Light-hearted aside: Wedding gifts can be way cool

You may all remember that I got married late last year. You may also recall that I'm a die-hard fan of the Sydney Swans. So you'll understand how thrilled I am at finally receiving our wedding gift from my wife's cousins, uncle & aunt - a Swans' player jumper signed by the entire 2005 Premiership-winning team!!!

Forget the toasters, folks, this is one awesome gift.

Wednesday, May 17, 2006

Multi-variate testing ready to burst forth....oh reeeaallly!?

Reading this just now and I'm subsequently bracing myself for a spate of useless statistical analysis from the field of Web analytics. My experience with the application of multi-variate testing goes back a decade and includes the fields of Statistics, archaeology, marketing and more recently, information architecture. Time and time again I see multi-variate testing wasted through a complete lack of multi-variate analysis.

Folks, it isn't enough to calculate the mean of several variables and pat yourself on the back for your multi-dimensional approach to research. Unless you're going to perform analysis that creates a correlation between variables you are wasting your time. Even something as simple as cross-tabulation will provide you with insights not available through standard summary statistics on a single variable - despite calculating them for a series of variables.

For example:
Out of 100 users...
65 found the interface easy to understand, 25 found it confusing, 10 found in frustrating
50 were able to locate the information they required, 35 were unable to find the information, 15 found the information but didn't recognise it.

So, does that mean that a majority of users find the interface easy to understand and were able to locate the information they required?...

What if I told you that, of the 50 users able to locate their information, 25 of them were the ones that found the interface confusing? How about if, of the 65 that found the interface easy to understand, 35 of them were unable to locate their information?

Anyway, it bugs the bejeesus out of me when I see this sort of thing.

And if you're thinking this guy may not be representative of the standard within the Web analytics fraternity, pick up a book - any book - on the subject, and I challenge you to locate the analysis that goes beyond this style of simplistic, superficial level.

If you find one let me know. I'll even buy a copy.

Tuesday, May 16, 2006

MMORPG - the role of the tutorial and self-help in complex systems

I recently started playing a new computer game - Eve Online - a massively multi-player online roll-playing game, or MMORPG for short. The game is a spaced-based mixture of adventure, commerce, pirate-hunts, and character development, set in a galaxy far, far away. The game is rich, complex, and involves interacting with real players around the world to achieve common goals.

The game is FANTASTIC! I love this style of game. But that's not why I'm writing about it...

The complexity of the environment and the rules of engagement make it almost impossible to simply document in a user manual. The item database itself - the things you can buy, find, build, install etc - runs into the hundreds of pages, and a lot of the contents won't be relevant until months after you start playing.

The problem for the game designers, and new players, is that there's so much to know and learn and yet you can't force people to spend a couple of weeks poring over a user manual before they can start playing; you have to provide an 'in' to the early levels of the game.

The game designers (and I don't think this is unique to this particular game) have tackled the problem of how to introduce new players to such complexity in two interesting ways:
i) A fairly extensive tutorial that leads new players step-by-step into the environment. From how to configure a ship, to trading & commerce, to combat and moving through space.
ii) A rich online chat built into the game that provides general how-to support for new players, and the opportunity to communicate with fellow players in real time.

With the growing prominence of rich internet applications - now in three flavours - and the increasing richness (ha) that derives from these interaction environments - I'm starting to see the need for a similar approach (i.e. an introductory tutorial) to Web applications. Whilst user research will uncover the primary tasks and objectives of the audience; and usability testing will uncover barriers to use; sometimes it will be necessary to provide a step-by-step run-through of the complex processes before users will 'get it'.

Unlike computer games, however, Web applications lack the initial commitment from the user that would make such a personal investment likely prior to actual use. So is there a limit to the complexity we can introduce into the interaction design of our Web applications before the up-front investment in time will be prohibitive to use?

Wednesday, May 03, 2006

More about product design...

Just to balance my karma a little after my last gripe about product designers, I have to make mention of the FujiXerox printer that recently came into my 'possession'. A3, double-sided, colour laser printer. Networked (via Ethernet), three trays... all the things you want in a printer.

Installation and configuration took under 10 minutes from opening the box to first printing. That includes time spent exclaiming over the size!!

For those interested in such things, it's a DocuPrint C2428.

Lovely; easy; efficient.