Page 6 of 14
Usability evaluation
Usability is generally defined as a measure of how easy and pleasant a design is to use. It is vital to evaluate usability throughout the development cycle to build a high-quality product that’s efficient, effective and satisfying to use. Depending on where you are in the development cycle, usability evaluations can be formative or summative.
A formative usability evaluation is conducted early on in the design process, and often repeated as part of an iterative process as you test whether changes based on previous findings have improved usability. It can involve evaluating concepts with paper sketches, wireframes, storyboards, various types of mockups or using clickable prototypes.
A summative usability evaluation is conducted later on in the development cycle to evaluate a product in development, a working prototype, or a completed product. It is often used to establish a usability benchmark of a finished product and, therefore, frequently focuses on producing quantitative data (e.g. task completion time, failure rate). However, we suggest that a qualitative component is always valuable and improvements to the product should be considered after evaluation.
Approaches to evaluating usability
Depending on the scope of your usability evaluation and whether you’re conducting a formative or summative usability evaluation, a range of approaches can be useful. Often these are combined to fulfill the development’s need for usability insight.
Heuristic evaluation is a quick and inexpensive way to feed usability into the design process. Using a set of heuristics, a team of experts evaluates the design for its usability. The experts can be experienced members of your team that follow the heuristics to identify potential usability issues. While heuristic evaluation is a great way to inform the initial development stages, it shouldn’t replace usability testing with users.Ideally you should use heuristic evaluation to inform the scope of usability testing.
- Example: Nielsen’s Heuristics
Usability testing involves testing the product or prototypes of various fidelity with the intended user group.. Users’ interaction with the product is usually recorded on video, and researchers observe as participants follow scenarios and complete relevant tasks with the product.
Thinking-aloud testing involves the user verbalising what they’re thinking, doing and feeling while performing a usability task. This can provide insight into the issues users encounter in performing a given task, as well as provide measurements such as task completion time and failure rate. In this method the involvement of the researcher is minimal; you should only prompt the participant when it’s clear that they’re stuck on a task and you need to move them on to complete the session. This method is particularly useful in exploring users’ expectations of a product and to compare this to the actual experience. Thinking-aloud is commonly part of formative usability evaluations to inform further development.
Benchmark testing, in contrast to think-aloud testing, is focused on measuring the success rate on a set of given tasks to determine usability as defined by quantitative measures. These measures depend on the type of product evaluated but usually include time to complete a task, number and types of use errors per task and number of users making a specific error. Benchmark testing is usually performed as part of a summative usability evaluation on a finished or late-stage product.
Semi-structured interviews and satisfaction questionnaires can, together with thinking-aloud testing, be useful to understand better what users liked or didn’t like about a product or why they made a use error. For example, a researcher would observe users completing a set of tasks and note down parts of the product to discuss later on based on what they observed. Questionnaires can be of similar use for a larger sample but are less flexible and pointed in what they explore. They are more suitable for benchmark testing as they can produce quantitative measures.
A Wizard of OZ approach to user testing can be very useful to evaluate the usability of a product before it reaches a certain development stage. In this approach a “wizard” (aka a research team member) simulates certain functions that would actually be done by technology in the final product. Ideally, this is unbeknown to the user. For example, in a product that is planned to utilize speech recognition technology a team member listens to the users’ commands instead and initiates the product’s responses. This allows certain aspects of the development to progress even if the technology is not yet implemented.
Settings for usability evaluation
The setting of usability testing depends on the intended use of the product and scope of the usability testing. For example, to assess the user interface only, a lab setting that controls for environmental factors is the most appropriate. If, however, the scope of testing is to evaluate how the product works in its intended setting, a contextual approach in users’ homes or a clinical setting is more suitable.
In between these two options there are setups that allow controlling the environment to various degrees while simulating in-context use (e.g. a lab setting designed to simulate home use).
Sample sizes
The sample size for usability testing can vary depending on scope and approach used. However, 5 to 12 users are usually considered sufficient. If your product has more than one intended user group (e.g. patients and healthcare practitioners) you should involve this number of users for each group.
Generally, if you’re looking to establish a quantitative benchmark a larger sample is need than in think-aloud studies.