Recommendations for Designing Audio Augmented Reality Experiences

Published: 19 November 2019

In our first post on Audio Augmented Reality (AAR), Anna Nolda Nagele wrote about the development of a multiplayer experience, "Please Confirm You Are Not A Robot" which allowed her to conduct some user testing into what was possible with this new technology. To evaluate the experience, Anna tested the final prototype with four groups of four participants each. In this post, Anna makes some suggestions for people who are designing multiplayer experiences in AAR for smart headphones. As there is little original research on multiplayer AAR, and this study covered several broad themes to gather initial insights into areas which could be explored in further, more in-depth research studies.

Our objective was to examine how spatial audio can prompt and support actions in interactive AAR experiences; how distinct auditory information influences collaborative tasks and group dynamics; and how participatory AAR can enhance storytelling.

User Experience Study

To evaluate the experience, we conducted a study using a combination of research tools: qualitative questionnaires, observation and guided group discussion. As the experience is designed for groups, a mix of discussion and questionnaire were selected. These allowed individuals to express their opinion without the bias that occurs when talking in a group.

We compared results from the observations with insights from the group discussion and questionnaires, grouping the results into categories which helped us derive a set of design recommendations.

Results

Players were impressed and surprised by spatial audio, especially if they had never heard it before. People used their whole body to interact with sounds and found it interesting to navigate using their ears. Participants would move with their entire body, leading with their heads and using arms and heads as pointers in real space to navigate the sonic space. It was, however, difficult for them to find exact locations of sounds because sound positioning in the experience was unstable due to the high latency of the head-tracking sensor. This latency led to confusion and frustration.

Participants used various tools of verbal and non-verbal communication to communicate while listening and while each was hearing different and individual audio elements. At these points, people gave accounts to each other of what they heard. When narration overlapped with another participant speaking, people actively chose which one to hear. When technical issues occurred, participants tried to clarify among each other whether this was an intentional part of the experience's design.

The performative aspect was well received, and participants took on roles and group responsibilities, collaborating to identify when they heard different information. People were prompted to move together, and they started to take on a part when they felt they had agency to interpret instructions their way or with the other participants. The participants found it strange to act with strangers at first, but everyone reported feeling a little bit closer to the group at the end, mostly smiling or laughing. Players were involved throughout and showed a long attention span, reflecting on the group activities and the content after it had ended.

Analysis and Discussion

The head-tracking latency on the Bose Frames made it difficult for users to localise spatial audio. They perceived strong spatial audio cues but regarded the positioning as unstable. For those without any previous experience with spatial audio, a training section in the experience would be helpful to understand the cues, hardware controls and feedback sounds.

Nevertheless, people showed an interest in spatial audio. It was best received when it was unclear whether a sound was real or augmented. Feedback must be clearly distinguishable from other noises and recognise gestures even if they are unsuccessful.

There should be a clear relationship between actions in real life and augmented sounds - being aware of the surrounding audio environment is a characteristic of audio augmented reality. The user's attention shifts between the real and augmented world and they can choose which to pay attention to. AAR should always connect to reality and integrate with it. It should enhance a real experience and thus increase the feeling of presence in the augmented reality. There is potential to create a heightened sense of presence in the AAR and the real world to which it is linked.

It is a collaborative effort of the group to keep the experience going. This is visible when participants figure out information that only they receive individually (and not the rest of the group), when tasks are unclear, or when the technology breaks. They learn from each other in terms of movements and interactions with the technology. A distinction between individual and shared information is desirable to avoid confusion. Occasionally, participants found a discussion about a job they had been asked to do was distracting from the task itself. When audio is playing that leaves no room for verbal communication, people actively use body language to communicate with the rest of the group. It is uncomfortable for users to be interrupted by a sound when they are still talking. Equally, they didn't like interrupting the narrator themselves.

It became confusing for participants when information drifted out of sync and they had to wait for one another. Clear checkpoints to synchronise the experience should be established. People usually remained still when listening, and it quickly became overwhelming when there was too much information at the same time. The prioritisation of what to do (listen, move or speak) happened both in a group and as a pair.

The participants enjoyed receiving a unique role in the story, embodying it and keeping it as a secret between them and the narrator "Pi" until they were instructed to act on behalf of Pi. The narrator of the story becomes unexpectedly authoritative, and people only do what they are told to. Participants are unsure themselves how far the voice could push them to go. They build trust in the narrator and want to live up to the expectations the story sets for them.

Players inherently have an audience and perform for each other, paying attention to what they hear, what others do and say; adjusting to each other to such a degree that each group creates a unique dynamic. People behave similarly, watching and learning from each other, copying gestures, movements and interactions. There is a significant social potential for movement and sound in AAR, getting people to move together and loosen up with each other.

Design recommendations for multiplayer AAR experiences for smart headphones:

Provide training for 3D audio listening. Demonstrate what spatial sounds are and where to locate them.
Include a clear onboarding sequence which introduces the technology and its possibilities.
Design sufficient feedback sounds and explain them in the onboarding.
Consider the context in which people will hear the experience and design with this in mind to explore the full potential of mixing real and augmented sounds.
Don't design for full immersion. This is a characteristic of AAR.
Allow interactions with narrated content (repeat, pause, etc.)
Avoid long stretches of narrated content to avoid users zoning out.
Don't overwhelm with too much information. Allow for moments of focused listening.
Avoid ambiguous stretches of silence. Clearly fade in and out of it or mask it with ambient sounds.
Consider the characteristics of your target hardware (in the case of our test, Bose Frames) 鈥� e.g. head-tracker latency, control recognition issues - in the experience design.
Design for natural interactions with spatial sound (e.g. hand gestures, body movement.)
Gamify experiences by providing an auditory display of status and a comparison with other participants.
Always use information delivered to individual players with intent and make it clear when information is private to one player vs public within the group.
Use spatial sounds over narrator instructions to guide the listeners or inspire movement and conversation.
Make room for people to collaborate and learn from each other when considering gestures, movements or interactions with the sounds.
Build trust in the experience but don't abuse it - give the authority to perform back to the audience.
Give participants a clear role or character in the story.
Authorise the audience to behave naturally and use the freedom that comes with lightweight, wireless technology.

Further Research

This study opened up many exciting areas for research. Most notably, further research could look at the communication between participants when navigating a shared augmented sound environment.

Another observation worth investigating was the clear emergence of distinct group identities. Whether or not this is due to AAR did not become apparent, but each research group showed specific characteristics that all of the four players represented. The groups became one and started moving as one organism. This led to each group having slightly different experiences and could be linked to the freedom of allowing an audience to find their own way through the story by interpreting it differently.

Freedom is another aspect to investigate in future research. Wearing AAR equipment allows immense freedom with minimal restriction of movement or location compared to other XR technologies.

To expand on these design recommendations, the next step would be to look at the tools and materials that can be used to design and tell stories in AAR experiences.

If you haven't already, read our first post on audio AR, where we detail the development of the four-player performance-based AAR experience "Please Confirm You Are Not A Robot".

麻豆约拍

Accessibility links

Rebuild Page

Useful links

Theme toggler

麻豆约拍