Friday, May 26, 2006

When 100% isn't really 100% - updated!

The latest Journal of Usability Studies (Issue 3, Vol 1) includes an article by James Lewis and Jeff Sauro of IBM and Oracle, respectively, entitled "When 100% isn't really 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates". The article - which is very clearly written, and provides nice 'take-aways' for the non-mathematical - provides a very neat look at alternate ways of estimating task completion rates from small-sample usability tests.

The basic idea of the article is that usability tests:
  1. Typically involve small participant numbers
  2. Report task completion rates as a primary success measure
  3. Typically calculate task completion rates as the number of successes / number of attempts (x/n)
When faced with extremes - e.g. 0% or 100% - we are faced with the difficult choice of producing an unlikely estimate - complete success or complete failure. Since, from experience, we know this is generally not the case, what alternative methods have we for estimating the likely rate of task completion.

The authors compare a number of different estimation methods - Laplace, Wilson, Jeffrey, MLE (x/n) and one of their own construction - 'Split-difference' - and recommend a particular alternative to the x/n method for various sample sizes and MLE value.

This article is well worth a read and should provide you with some extra depth to your analytical toolkit.

As a follow on from this, let's assume you've run a usability test and have the following:
Task 1: 4/6 successful completions = 66.67% success
Task 2: 4/5 successful completions = 80% success
Task 3: 6/8 successful completions = 75% success
[Note: typically, yes, each task would have the same number of users, but I'm making this up, so I can say what I want.]

The journal article tells us that in reality we can say the following:

Task 1: the real completion rate at launch should be somewhere between 21% and 99.3%, but we expect it to be around 60%;
Task 2: the real completion rate at launch should be somewhere between 25.7% and 100%, but we expect it to be around 67%;
Task 3: the real completion rate at launch should be somewhere between 34.3% and 99.5%, but we expect it to be around 67%; and
we can be only 95% certain that even those ranges will be accurate.

Kind of depressing really, isn't it.

Note: if you use around 30 test subjects instead, and maintain the same success ratios for each task, then you could expect the following:

Task 1: 47.7% - 81.9% with an expected success ratio of 64.8% (30 users)
Task 2: 61.44% - 91.75% with an expected success ratio of 76.6% (30 users)
Task 3: 56.82% - 87.82% with an expected success ratio of 72.32% (32 users)

So you can get a much narrower range for your estimate, but 30+ users is a significant undertaking for a usability test.

No comments: