Designing for Voice: Crafting Natural Conversations with AI
Strategies and Best Practices for Creating Engaging and Intuitive Voice User Interfaces
Voice user interfaces (VUIs) are rapidly gaining popularity, allowing users to interact with devices like Amazon Alexa and Google Home using voice commands. A VUI is a type of user interface that enables us to communicate with computers or devices through voice or speech, rather than traditional input methods like typing or clicking. As more people adopt voice-activated technology, UX professionals face new challenges in creating engaging and intuitive VUIs that cater to users' needs and expectations.
Designing for voice interactions is a significant departure from traditional visual interface design. Many designers rely heavily on graphical elements, layout, and visual hierarchy to guide users and communicate information effectively. However, when designing VUIs, these visual cues are absent, requiring designers to focus on sound, tone, and conversation flow to create effective user experiences.
The emergence of advanced language models like ChatGPT has opened up new possibilities for VUIs, enabling more natural and human-like conversations. These AI-powered assistants can understand context, provide relevant responses, and engage in open-ended conversations. As voice technology continues to evolve, we must adapt our skills and knowledge to create VUIs that are functional, emotionally engaging, and trustworthy, reshaping the way we interact with technology in our daily lives.
Understanding the Characteristics of Voice Interaction
Voice interaction differs from traditional graphical user interface (GUI) interaction in several key ways:
Linear and ephemeral: Voice interactions are linear and ephemeral, meaning that users can only process one piece of information at a time, and unlike visual interfaces, they cannot easily refer back to previous information (Murad et al., 2018). In a GUI, users can scan the screen and quickly find the information they need, but in a VUI, they must rely on their memory and the system's prompts to navigate the conversation.
Lack of visual cues: Without visual cues, users must rely on auditory feedback and memory to navigate the interface, which can increase cognitive load. In a GUI, designers use visual elements like buttons, menus, and icons to guide users and provide affordances for interaction. In a VUI, designers must use sound design, voice prompts, and conversation flow to guide users and communicate the available actions.
Natural language: Users interact with VUIs using natural language, which can be ambiguous and context-dependent. Designers must account for the variability in how users express themselves (Pearl, 2016). Unlike GUIs, where users interact with a limited set of predefined controls, VUIs must be able to understand and respond to a wide range of user utterances, including different phrasings, accents, and linguistic styles. The infamous Scottish elevator sketch below demonstrates some of these challenges.
Invisibility: VUIs are invisible, meaning that users can't see the available options or the system's capabilities at a glance (Yankelovich et al., 1995). In a GUI, users can explore the interface and discover new features by browsing menus and clicking buttons. In a VUI, users must rely on the system's prompts and their own mental model of what the system can do to discover new features and capabilities.
Designing for these unique characteristics requires a deep understanding of user needs, expectations, and mental models. Designers must anticipate how they might interact with the system using natural language. This requires a shift in thinking from visual design to conversational design, focusing on the flow of the conversation, the clarity of the prompts, and the naturalness of the responses.
Best Practices for Designing Effective VUIs
When designing VUIs, it is crucial to begin by understanding the user's requirements, needs, and context under which they will engage with the VUI. This entails acknowledging the user's surroundings, the objectives they aim to achieve, and any constraints they may encounter. Conducting user research is key in order to achieve this.
Some other practices that we can keep in mind when designing VUIs are listed below:
Design for conversational flow: Create a natural and intuitive conversational flow that mimics human-to-human interaction. Use principles of turn-taking, feedback, and repair strategies to ensure smooth communication. For example, the system should provide clear cues when it's the user's turn to speak, and it should be able to handle interruptions and back-and-forth exchanges gracefully.
Keep interactions brief and focused: Users prefer short, focused interactions with VUIs. Design for brevity by breaking complex tasks into smaller, manageable steps and providing clear, concise prompts. For instance, instead of presenting users with a long list of options, ask them a series of short, specific questions to guide them through the task.
Provide clear feedback and confirmation: Ensure that users always know what the system is doing and what is expected of them. Provide clear feedback and confirmation messages to maintain transparency and build trust. For example, after the user makes a request, the system should confirm what it heard and provide an update on the status of the task. You can also use visual cues and specific sounds to achieve this.
Handle errors gracefully: Anticipate potential user errors and provide users ways to recover from them. Use progressive disclosure to guide users back on track and provide helpful error messages that suggest alternative actions. For instance, if the user makes an ambiguous request, the system should ask for clarification and provide examples of what the user can say.
Optimise for hands-free and eyes-free interaction: Design for scenarios where users may be multitasking or have limited visual attention. Ensure that the VUI can be used effectively without visual feedback or manual input (Porcheron et al., 2018). For example, a voice-controlled cooking assistant should be able to guide users through a recipe step-by-step, without requiring them to look at a screen or use their hands.
Personalise the experience: Make use of user data and context to provide personalised interactions. Use natural language processing and machine learning to adapt to user preferences and behaviour over time. For instance, a voice-controlled music player should be able to learn the user's favourite genres and artists and make personalised recommendations based on their listening history.
Prioritise simplicity and clarity: Avoiding jargon and complex language is a way to ensure that users of varying technical proficiency can easily interact with the interface. For example, opt for simple commands like "To turn on the living room lights, just say 'turn on the living room lights'" instead of "To activate the illumination in the primary living space, verbalise the command 'initiate living room illumination.'"
Balancing efficiency with discoverability: While users want interactions to be quick and efficient, they also need to be able to discover new features and capabilities. Several strategies can help strike this balance, such as progressives disclosure, contextual suggestions, and user friendly onboarding and tutorials.
Addressing Privacy Concerns
An area users are often concerned about when it comes to using VUIs is their privacy implications. Voice-based systems are always listening and can potentially collect sensitive data. It is important to try addressing these concerns. Some suggestions include:
Being transparent about data collection and usage: Clearly communicate what data is being collected, how it is used, and how users can control their privacy settings (Easwara Moorthy & Vu, 2015). For example, a voice-controlled smart speaker should provide clear information about what data it collects and offer users the ability to opt-out of data collection or delete their voice recordings.
Providing privacy controls: Give users control over their data and the ability to delete or modify it as needed (Cho, 2019). For instance, a voice-controlled virtual assistant should allow users to review and delete their conversation history and provide granular controls over what types of data are collected.
Implementing secure data practices: Ensure that user data is securely stored and transmitted, and follow best practices for data protection and privacy (Cho, 2019). This includes using encryption, secure authentication, and regular security audits to protect user data from unauthorised access or breaches.
Conclusion
Designing VUIs demands a user-centred approach, considering the unique aspects of voice interaction. By adhering to best practices in conversational design and addressing privacy concerns, we can develop engaging, effective VUIs that fulfil user needs and expectations.