- How would you ask to play a song?
- How would you ask about the weather?
- What speech or conversational interfaces have you used before?
- How does it seem different from speech interfaces you’ve used before?
- Please draw the system you imagine behind the interface.
- Speech interfaces are still extremely new and didn’t seem as wide spread as some of the popular numbers. Less than half of my users had used a speech interface before.
- The arpeggiated tones confused users because they demanded too much attention.
- Users who hadn’t used a speech interface before had great difficulty articulating how my design was different from existing ones.
- A couple people understood that the different sounds indicated state, “like the lights on top of the Alexa”. The same people noticed the different voices, but couldn’t figure out why there were multiple voices.
- Many users wished the voices were more “analogue sounding”.
Updates and Outputs
The main sonic modification I’ve made is to remove the arpeggiated piano, and instead map state to an LFO of a synthesizer. The sound is much less distracting than the randomized piano notes.
I’ve also added visual cues to my mockup to help viewers understand where each sound is coming from.
My most important take away from testing and designing speech interfaces is that they have the same layers as common computer interfaces.
The layers of a speech interface are:
- operating system
Although I’ve realized I’ve been too ambition with my project scoping for this 2-credit class, applying the four layer framework to audio interfaces points to several specific problems I might work on in the future.
The same as with any other HCI, there is a human user.
In speech interfaces, applications are not clearly separated from operating systems.
On Amazon’s Alexa platform,
skills apps can be added, which is functionally similar to adding an application. However, instead of each application having it’s own distinct interface, most popular apps are still accessed through a conversation with Alexa.
Additionally, each assistant app’s abilities are being expanded by their respective development teams, but the source of the new abilities are not always shown to users.
In both cases, the user cannot distinguish between the interface application and the application itself.
- unique interfaces for each app
- interactions other than speech where applicable
3. Operating System
Operating systems typically handle:
- resource sharing for multiple apps running at the same time
- common tools shared across apps or used directly by a user, eg a clock and calendar
Speech operating systems’ current primary forms of sonification are chimes and speech synthesis. The two types of sound are chained together, similar to a scripted conversation. It seems odd that mobile and desktop devices have a wider range of interface sounds than an audio-first interface.
The linear chaining is not well suited for managing one of an operating systems most important tasks, managing parallel processes.
For example, since there is no sonification mapped to the state of a timer, a user query the assistant for the current state. If there are multiple timers happening at once, the user must inquire about each, one after another.
- communicating state through mapped sonification
- providing a managed, uncluttered soundscape
- expanded vocabulary of sounds
- operating system utilities
- example: Beverly Chou, Filling Out a Form
The hardware of voice assistants can be complicated.
- The underlying operating system is often running on cloud hardware.
- There might be several devices with microphones capable of being activated for the audio interface.
- There might be several, different devices with speakers capable for relaying the system responses back.
The ecology of devices with internet connections and speakers/microphones is changing rapidly, but seems to be expanding in most households. It is easy to imagine a future in which the soundscape of a home is an important HCI.
It also seems apparent that existing connected devices might be come smarter in the future. For example, many TV’s already have some form of connection to the internet, but are not integrated into assistant service platforms.
- additional integrations of tvs, phones, tablets, gaming consoles into smart soundscapes