Heuristic evaluation vs. user testing

Which one to use and when?

Jan 20, 2022

Heuristic Evaluation and Usability Testing are two different techniques for finding usability problems (Lauesen & Musgrove, 2007). Heuristic Evaluation is done by having usability experts look at an interface, evaluate it against a list of industry standards, and identify problems. In User Testing, potential users try out the interface by performing real tasks and identify problems that impact their experience.

Heuristic Evaluation

In Heuristic Evaluation, experts examine an interface and identify what is good and bad about it. The most popular heuristics used to achieve this are the ones developed by Nielsen and Molich (1990):

Visibility of system status: The system should always keep users informed about what is going on, through appropriate feedback within a reasonable time.appropriately and promptly.
Match between system and the real world: The system should speak the users’ language, with words, phrases, and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.
User control and freedom: Users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
Consistency and standards: Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.
Error prevention: Even better than good error messages is a careful design that prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.
Recognition rather than recall: Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
Flexibility and efficiency of use: Accelerators — unseen by the novice user — may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.
Aesthetic and minimalist design: Dialogues should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.
Help users recognize, diagnose, and recover from errors: Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. regarding errors and solutions.
Help and documentation: Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.

According to Wang & Caldwell (2002): “Using this technique is intuitive, inexpensive, and a quick way of getting a fairly comprehensive usability problem report”.

This method, however, has a number of drawbacks. In particular, it detects too many false positives (false alarms) and minor problems that wouldn’t bother actual users (Wang & Caldwell, 2002). This can result in designers and developers spending more time and effort fixing less critical issues, which can be costly to companies.

User Testing

User Testing is seen as the best way to identify the real problems that will impact user performance and experience. In this case, actual users are the ones identifying the issues, so there are fewer false positives identified, and the problems found are worth investigating. Unlike Heuristic Evaluation, User testing is more labour intensive as it takes time to prepare the testing materials (e.g., interview guide, tasks), recruit users, and analyse the data. It can also be more expensive to run as participants often require payment for their time.

What does research show?

Wang and Caldwell used Heuristic Evaluation and User Testing to test the usability of a pre-release version of the software they developed. Their major goal was to compare those two methods in terms of efficiency, effectiveness, and cost/benefit.

In their study, Heuristic Evaluation identified 58 unique problems compared to 10 identified in user testing. The problem types and severities were evaluated using Nielsen’s five-point severity-rating scale. Score 0 was given to false alarms, 1 to cosmetic problems, 2 to minor usability problems, 3 to major usability problems, and 5 to usability catastrophe. User Testing revealed mostly major and minor problems, while Heuristic Evaluation has 29.3% of false alarms. The results are presented in the table below.

Percentages of error type per category (data from Wang & Caldwell)

Heuristic Evaluation was cheaper than User Testing in terms of direct cost and time cost; the researchers paid $10.54 for Heuristic Evaluation and $47.30 for User Testing) and it took 15.5 hours to conduct the Heuristic Evaluation including data analysis, while User Testing required 45 hours.

At first glance, Heuristic Evaluation appears to be more appealing as it is less expensive and less time-consuming than User Testing. However, User Testing provided better performance estimates. While Heuristic Evaluation identified more issues a significant number of them were false alarms. On the other hand, User Testing identified mostly major problems. Furthermore, Wang & Caldwell (2002) suggest that some problems associated with learning are hard to be identified by using Heuristic Evaluation alone. For example, in this study, the test users identified two major problems that had not been detected by Heuristic Evaluation.

What does this mean for practitioners?

Heuristic Evaluation appears to be “a useful testing method in the earlier stages of software development”. It is a quick and cheaper method that identifies a wide range of problems. Correcting errors is easier and cheaper in earlier design stages compared to later design stages. Early in development, heuristic evaluation has been shown to have a hit rate of around 50% and reports around 50% false problems (Lauesen & Musgrove, 2007). Even though User Testing has a higher cost and requires more time to run, trying to correct false problems is much more costly than any time/money saved by conducting Heuristic Evaluation alone. As a result, using Heuristic Evaluation early on could help the product team to eliminate as many usability problems as soon as possible. User Testing is more appropriate once a functional prototype of the software is available as it can help us detect real usability problems from actual users. User Testing appears to be more effective than Heuristic Evaluation in finding major problems.

Heuristic Evaluation is a cheaper and quicker way to detect usability problems. However, it can’t replace User Testing, which can provide more insightful data from actual users that can have an impact on user experience.

An Empirical Study of Usability Testing: Heuristic Evaluation Vs. User Testing - Enlie Wang…
In this study, two different usability-testing methods (Heuristic Evaluation and User Testing) were selected to test…journals.sagepub.com

Severity Ratings for Usability Problems: Article by Jakob Nielsen
Severity ratings can be used to allocate the most resources to fix the most serious problems and can also provide a…www.nngroup.com

User Interface Design: A Software Engineering Perspective
Description For some software designers the interface is still seen as an add-on after the rest of the program has been…www.pearson.com

Heuristic Evaluation (Video)
Summary: Jakob Nielsen explains the heuristic evaluation method, which allows you to judge a user interface design…www.nngroup.com