The basic idea of the article is that usability tests:
- Typically involve small participant numbers
- Report task completion rates as a primary success measure
- Typically calculate task completion rates as the number of successes / number of attempts (x/n)
The authors compare a number of different estimation methods - Laplace, Wilson, Jeffrey, MLE (x/n) and one of their own construction - 'Split-difference' - and recommend a particular alternative to the x/n method for various sample sizes and MLE value.
This article is well worth a read and should provide you with some extra depth to your analytical toolkit.
As a follow on from this, let's assume you've run a usability test and have the following:
Task 1: 4/6 successful completions = 66.67% success
Task 2: 4/5 successful completions = 80% success
Task 3: 6/8 successful completions = 75% success
[Note: typically, yes, each task would have the same number of users, but I'm making this up, so I can say what I want.]
The journal article tells us that in reality we can say the following:
Task 1: the real completion rate at launch should be somewhere between 21% and 99.3%, but we expect it to be around 60%;
Task 2: the real completion rate at launch should be somewhere between 25.7% and 100%, but we expect it to be around 67%;
Task 3: the real completion rate at launch should be somewhere between 34.3% and 99.5%, but we expect it to be around 67%; and
we can be only 95% certain that even those ranges will be accurate.
Kind of depressing really, isn't it.
Note: if you use around 30 test subjects instead, and maintain the same success ratios for each task, then you could expect the following:
Task 1: 47.7% - 81.9% with an expected success ratio of 64.8% (30 users)
Task 2: 61.44% - 91.75% with an expected success ratio of 76.6% (30 users)
Task 3: 56.82% - 87.82% with an expected success ratio of 72.32% (32 users)
So you can get a much narrower range for your estimate, but 30+ users is a significant undertaking for a usability test.