Voice as Ecology: Voice Donation, Materiality, Identity

I first heard about voice donation while listening to “Being Siri,” an experimental audio piece about Erin Anderson donating her voice to Boston-based voice donation company, VocaliD. Like a digital blood bank of sorts, VocaliD provides a platform for donating one’s voice via digital audio recordings. These recordings are used to help technicians create a custom digital voice for a voiceless individual, providing an alternative to the predominately white, male, mechanical-sounding assistive technologies used by people who cannot vocalize for themselves (think Stephen Hawking). VocaliD manufactures voices that better match a person’s race, gender, ethnicity, age, and unique personality. To me, VocaliD encapsulates the promise, complexity, and problematic nature of our current speech AI landscape and serves as an example of why we need to think critically about sound technologies, even when they appear to be wholly beneficial.
Given the extreme lack of sonic diversity in vocal assistive technologies, VocaliD provides a critically important service. But a closer look at both the rhetoric used by the organization and the material process involved in voice donation also amplifies the limits of overly simplistic, human-centric conceptions of voice. For instance, VocaliD rhetorically frames their service by persistently linking voice to humanity—to self, authenticity, individuality. Consider the following statements made by Rupal Patel, CEO and founder of VocaliD, in which she emphasizes the need for voice donation technology:
“Here’s a way for us to acknowledge these individuals as unique human beings.” (Fast Company)
“I was talking to [a] girl we made a voice for. She told me that people are finally seeing her for who she really is.” (Medieros)
These are just a few examples from a larger discourse that reinforces the connection between voice and humanity. VocaliD’s repeated claims that their unique vocal identities humanize individuals imply that one is not fully human unless one’s voice sounds human. This rhetoric positions voiceless individuals as less than human (at least until they pay for a customized human-sounding voice).
VocaliD’s conflation of voice and humanity makes me wonder about the meaning of “human” in this context. For example, notions of humanity have been historically associated with Western whiteness—and deployed as a means of separating or distinguishing white people from Others—as Alexander Weheliye points out. Though VocaliD’s mission is to diversify manufactured voices, is a “human-sounding” voice still construed as a white voice? Does sounding human mean sounding white? Even if there is a bank of sonically diverse voices to choose from, does racial bias show up in the pacing, phrasing, or inflection caused by the vocal technology?

I am also disturbed by the rhetoric of humanity and individuality used by VocaliD because the company adopts the same rhetoric to describe the AI voices they sell to brands for media and smart products. Here’s an example of this rhetoric from the VocaliD AI website: “When you need a voice that resonates, evokes audience empathy, and sounds like you, rather than your competitors, VocaliD’s AI-powered vocal persona is the solution. Your voice — always on, where you need it when you need it.” Using similar rhetorical strategies to describe both voiceless people and products is dehumanizing. And yet, having a more diverse AI vocal mediascape, especially in terms of race, is crucially important since voice-activated machines and products are designed largely by white men who end up reinforcing the sonic color line.
Interestingly, the processes VocaliD uses to create a custom voice reveal that these voices are not, in fact, unique markers of humanity or individuality. It’s hard to find a detailed account of how VocaliD voices are made due to the company’s patents, but here are the basics: VocaliD does not transfer a donated voice directly to a voiceless person’s assistive technology. VocaliD technicians instead blend and digitally manipulate the donated voice with recordings of the noises a voiceless person can make (a laugh, a hum) to create a distinct new voice for the recipient. In other words, donated voices are skillful remixes that wouldn’t be possible without extracting vocal data and manipulating it with digital tools. Despite perpetuating narratives about voice, humanity, and authenticity, VocaliD’s creative blending of vocal material reveals that donated voices are the result of compositional processes that involve much more than people.
Further, considering VocaliD voices from a material rather than human-centric perspective amplifies something important about voices in general. All voices are composed of and grounded in an ecology. That is, voices emerge and are developed through a mixture of: (1) biological makeup (or technological makeup in the case of machines with voices); (2) specific environments and contexts (geography may determine the kind of accents humans have; AI voices have distinct sounds for their brands); (3) technologies (phones, computers, digital recorders and editors, software, and assistive technologies preserve, circulate, and amplify voices); and (4) others (humans often emulate the vocal patterns of the people they interact with most; many machine voices also sound like other machine voices). Put simply, all voices are intentionally and unintentionally composed over time—shaped by ever-changing bodily (and/or technological) states and engagements with the world. Voices are dynamic compositions by nature. Examining voice from a material standpoint shows that voices are not static markers of humanity; voices are responsive and malleable because they are the result of a complex ecology that involves much more than a “unique” human being.
However, focusing solely on the material aspects of vocality leaves out people’s lived experiences of voice. And based on online videos of VocaliD recipients—like Delaney, a seventeen-year-old with cerebral palsy—VocaliD voices seem to live up to the company’s hype. Delaney appears delighted by her new voice, stating: “I was so excited to get my own voice. I used to have a computer voice and now I sound like a girl. I like that. And I talk more.” Delaney’s teachers also discuss how her new voice completely changed her demeanor. Whereas before Delaney was reluctant to use her assistive technology to speak, her new voice gives her confidence and a stronger sense of identity. As her teacher explains in the video, “she is really engaged in groups, she wants to share her answers, she’s excited to talk with friends. It’s been really nice to see.” For Delaney, a VocaliD voice represents a newfound sense of agency.
It’s important to recognize this video is not necessarily representative of every VocaliD recipient’s experience, or even Delaney’s full experience. As Meryl Alper notes in Giving Voice, these types of news stories “portray technology as allowing individuals to ‘overcome’ their disability as an individual limitation, and are intended to be uplifting and inspirational for able-bodied audiences” (27). While we should be wary of the technological determinism in the video, observing Delaney use her VocaliD voice—and listening to the emotional responses of her mom and teachers—makes it difficult to deny that donated voices make a positive impact. For me, this video also gets at a larger truth about humans and voice: the ways we hear and understand our own voices, and the ways others interpret the sounds of our voices, matter a great deal. Voices are integral to our identities—to the ways we understand and think about ourselves and others—and the sounds of our voices have social and material consequences, as the SO! Gendered Voices Forum illustrates so clearly.

It’s worth repeating that VocaliD’s mission to diversify synthetic voices is incredibly important, especially given the restrictive vocal options available to voiceless individuals. It’s also necessary to acknowledge the company has limitations that end up reproducing the structural inequities it tries to address. As Alper observes, “In order to become a speech donor, one must have three to four hours of spare time to record their speech, access to a steady and strong Internet connection, and a quiet location in which to record” (162-63). With these obstacles to donating one’s voice in mind, it’s not surprising that all the VocaliD recipient videos I could find feature white people. Donating one’s voice is much easier for middle to upper class white people who have access to privacy, Internet, and leisure time.
This brief examination of VocaliD raises questions about what a more equitable future for vocal technologies might look/sound like. Though I don’t have the answer, I believe that to understand the fullness of voice, we can’t look at it from a single perspective. We need to account for the entire vocal ecology: the material (biological, technological, financial, etc.) conditions from which a voice emerges or is performed, and individual speakers’ understanding of their culture, race, ethnicity, gender, class, ability, sexuality, etc. An ecological approach to voice involves collaborating with people and their vocal needs and desires—something VocaliD models already. But it also involves accounting for material realities: How might we make the barriers preventing a more diverse voice ecosystem less difficult to navigate—especially for underrepresented groups? In short, we must treat voice holistically. Voices are more than people, more than technologies, more than contexts, more than sounds. Understanding voice means acknowledging the interconnectedness of these things and how that interconnectedness enables or precludes vocal possibilities.
—
Featured image: 366-350 You can’t shut me up, Jennifer Moo, CC BY-ND
—
Steph Ceraso is an associate professor of digital writing and rhetoric at the University of Virginia. Her 2018 book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, proposes an expansive approach to teaching with sound in the composition classroom. She also published a digital book in 2019 called Sound Never Tasted So Good: ‘Teaching’ Sensory Rhetorics—an exploration of writing, sound, rhetoric, and food. She is currently working on a book project that examines sonic forms of invention in various contexts.
—

REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
Mr. and Mrs. Talking Machine: The Euphonia, the Phonograph, and the Gendering of Nineteenth Century Mechanical Speech – J. Martin Vest
Only the Sound Itself?: Early Radio, Education, and Archives of “No-Sound”–Amanda Keeler
Technological Interventions, or Between AUMI and Afrocuban Timba

Editors’ note: As an interdisciplinary field, sound studies is unique in its scope—under its purview we find the science of acoustics, cultural representation through the auditory, and, to perhaps mis-paraphrase Donna Haraway, emergent ontologies. Not only are we able to see how sound impacts the physical world, but how that impact plays out in bodies and cultural tropes. Most importantly, we are able to imagine new ways of describing, adapting, and revising the aural into aspirant, liberatory ontologies. The essays in this series all aim to push what we know a bit, to question our own knowledges and see where we might be headed. In this series, co-edited by Airek Beauchamp and Jennifer Stoever you will find new takes on sound and embodiment, cultural expression, and what it means to hear. –AB
—
In November 2016, my colleague Imani Wadud and I were invited by professor Sherrie Tucker to judge a battle of the bands at the Lawrence Public Library in Kansas. The battle revolved around manipulation of one specific musical technology: the Adaptive Use Musical Instruments (AUMI). Developed by Pauline Oliveros in collaboration with Leaf Miller and released in 2007, the AUMI is a camera-based software that enables various forms of instrumentation. It was first created in work with (and through the labor of) children with physical disabilities in the Abilities First School (Poughkeepsie, New York) and designed with the intention of researching its potential as a model for social change.

AUMI Program Logo, University of Kansas
Our local AUMI initiative KU-AUMI InterArts forms part of the international research network known as the AUMI Consortium. KU-AUMI InterArts has been tasked by the Consortium to focus specifically on interdisciplinary arts and improvisation, which led to the organization’s commitment to community-building “across abilities through creativity.” As KU-AUMI InterArts member and KU professor Nicole Hodges Persley expressed in conversation:
KU-AUMI InterArts seeks to decentralize hierarchies of ability by facilitating events that reveal the limitations of able-bodiedness as a concept altogether. An approach that does not challenge the able-bodied/disabled binary could dangerously contribute to the infantilizing and marginalization of certain bodies over others. Therefore, we must remain invested in understanding that there are scales of mobility that transcend our binary renditions of embodiment and we must continue to question how it is that we account for equality across abilities in our Lawrence community.
Local and international attempts to interpret the AUMI as a technology for the development of radical, improvisational methods are by no means a departure from its creators’ motivations. In line with KU-AUMI InterArts and the AUMI Consortium, my work here is that of naming how communal, mixed-ability interactions in Lawrence have come to disrupt the otherwise ableist communication methods that dominate musical production and performance.
The AUMI is designed to be accessed by those with profound physical disabilities. The AUMI software works using a visual tracking system, represented on-screen with a tiny red dot that begins at the very center. Performers can move the dot’s placement to determine which part of their body and its movement the AUMI should translate into sound. As one moves, so does the dot, and, in effect, the selected sound is produced through the performer’s movement.
Could this curious technology help build radical new coalitions between researchers and disabled populations? Mara Mills’s research examines how the history of communication technology in the United States has advanced through experimentation with disabled populations that have often been positioned as an exemplary pretext for funding, but then they are unable to access the final product, and sometimes even entirely erased from the history of a product’s development in the name of universal communication and capitalist accumulation. Therefore, the AUMI’s usage beyond the disabled populations first involved in its invention always stands on dubious historical, political, and philosophical ground. Yet, there is no doubt that the AUMI’s challenge to ableist musical production and performance has unexpectedly affected and reshaped communication for performers of different abilities in the Lawrence jam sessions, which speaks to its impressive coalitional potential. Institutional (especially academic) research invested in the AUMI’s potential then ought to, as its perpetual point of departure, loop back its energies in the service of disabled populations marginalized by ableist musical production and communication.
Facilitators of the library jam sessions, including myself, deliberately avoid exoticizing the AUMI and separating its initial developers and users from its present incarnations. To market the AUMI primarily as a peculiar or fringe musical experience would unnecessarily “Other” both the technology and its users. Instead, we have emphasized the communal practices that, for us, have made the AUMI work as a radically accessible, inclusionary, and democratic social technology. We are mainly invested in how the AUMI invites us to reframe the improvisational aspects of human communication upon a technology that always disorients and reorients what is being shared, how it is being shared, and the relationships between everyone performing. Disorientations reorient when it comes to our Lawrence AUMI community, because a tradition is being co-created around the transformative potential of the AUMI’s response-rate latency and its sporadic visual mode of recognition.
In his work on the AUMI, KU alumni and sound studies scholar Pete Williams explains how the wide range of mobility typically encouraged in what he calls “standard practice” across theatre, music, and dance is challenged by the AUMI’s tendency to inspire “smaller” movements from performers. While he sees in this affective/physical shift the opportunity for able-bodied performers to encounter “…an embodied understanding of the experience of someone with limited mobility,” my work here focuses less on the software’s potential for able-bodied performers to empathize with “limited” mobility and more on the atypical forms of social interaction and communication the AUMI seems to evoke in mixed-ability settings. An attempt to frame this technology as a disability simulator not only demarcates a troubling departure from its original, intended use by children with severe physical disabilities, but also constitutes a prioritization of able-bodied curiosity that contradicts what I’ve witnessed during mixed-ability AUMI jam sessions in Lawrence.
Sure, some able-bodied performers may come to describe such an experience of simulated “limited” mobility as meaningful, but how we integrate this dynamic into our analyses of the AUMI matters, through and through. What I aim to imply in my read of this technology is that there is no “limited” mobility to experientially empathize with in the first place. If we hold the AUMI’s early history close, then the AUMI is, first and foremost, designed to facilitate musical access for performers with severe physical disabilities. Its structural schematic and even its response-rate latency and sporadic visual mode of recognition ought to be treated as enabling functions rather than limiting ones. From this position, nothing about the AUMI exists for the recreation of disability for able-bodied performers. It is only from this specific position that the collectively disorienting/reorienting modes of communication enabled by the AUMI among mixed-ability groups may be read as resisting the violent history of labor exploitation, erasure, and appropriation Mills warns us about: that is, when AUMI initiatives, no matter how benevolently universal in their reach, act fundamentally as a strategy for the efficacious and responsible unsettling of ableist binaries.
The way the AUMI latches on to unexpected parts of a performer’s body and the “discrepancies” of its body-to-sound response rate are at the core of what sets this technology apart from many other instruments, but it is not the mechanical features alone that accomplish this. Sure, we can find similar dynamics in electronics of all sorts that are “failing,” in one way or another, to respond with accuracies intended during regular use, or we can emulate similar latencies within most recording software available today. But what I contend sets the AUMI apart goes beyond its clever camera-based visual tracking system and the sheer presence of said “incoherencies” in visual recognition and response rate.

Image by Ray Mizumura-Pence at The Commons, Spooner Hall, KU, at rehearsals for “(Un)Rolling the Boulder: Improvising New Communities” performance in October 2013.
What makes the AUMI a unique improvisational instrument is the tradition currently being co-created around its mechanisms in the Lawrence area, and the way these practices disrupt the borders between able-bodied and disabled musical production, participation, and communication. The most important component of our Lawrence-area AUMI culture is how facilitators engage the instrument’s “discrepancies” as regular functions of the technology and as mechanical dynamics worthy of celebration. At every AUMI library jam session I have participated in, not once have I heard Tucker or other facilitators make announcements about a future “fix” for these functions. Rather, I have witnessed an embrace of these features as intentionally integrated aspects of the AUMI. It comes as no surprise, then, that a “Battle of the Bands” event was organized as a way of leaning even further into what makes the AUMI more than a radically accessible musical instrument––that is, its relationship to orientation.
Perhaps it was the competitive framing of the event––we offered small prizes to every participating band––or the diversity among that day’s participants, or even the numerous times some of the performers had previously used this technology, but our event evoked a deliberate and collaborative improvisational method unfold in preparation for the performances. An ensemble mentality began to congeal even before performers entered the studio space, when Tucker first encouraged performers to choose their own fellow band members and come up with a working band name. The two newly-formed bands––Jayhawk Band and The Human Pianos––took turns, laying down collaboratively premeditated improvisations with composition (and perhaps even prizes) in mind. iPad AUMIs were installed in a circle on stands, with studio monitor headphones available for each performer.
Jayhawk Band’s eponymous improvisation “Jayhawks,” which brings together stylized steel drums, synthesizers, an 80’s-sounding floor tom, and a plucked woodblock sound, exemplifies this collaborative sensory ethos, unique in the seemingly discontinuous melding of its various sections and the play between its mercurial tessellations and amalgamations:
In “Jayhawks,” the floor tom riffs are set along a rhythmic trajectory defiant of any recognizable time signature, and the player switches suddenly to a wood block/plucking instrument mid-song (00:49). The composition’s lower-pitched instrument, sounding a bit like an electronic bass clarinet, opens the piece and, starting at 00:11, repeats a melodically ascending progression also uninhibited by the temporal strictures of time signature. In fact, all the melodic layers in “Jayhawk,” demonstrate a kind of temporally “unhinged” ensemble dynamic present in most of the library jam sessions that I’ve witnessed. Yet unexpected moves and elements ultimately cohere for jam session performers, such as Jayhawk Band’s members, because certain general directions were agreed upon prior to hitting “record,” whether this entails sound bank selections or compositional structure. All that to say that collective formalities are certainly at play here, despite the song’s fluid temporal/melodic nuances suggesting otherwise.
Five months after the battle of the bands, The Human Pianos and Jayhawk Band reunited at the library for a jam session. This time, performers were given the opportunity to prepare their individual iPad setup prior to entering the studio space. These customized setup selections were then transferred to the iPads inside the studio, where the new supergroup recorded their notoriously polyrhythmic, interspecies, sax-riddled composition “Animal Parade”:
As heard throughout the fascinating and unexpected moments of “Animal Parade,” the AUMI’s sensitivity can be adjusted for even the most minimal physical exertion and its sound bank variety spans from orchestral instruments, animal sounds, synthesizers, to various percussive instruments, dynamic adjustments, and even prefabricated loops. Yet, no matter how familiar a traditionally trained (and often able-bodied) musician may be with their sound selection, the concepts of rhythmic precision and musical proficiency––as they are understood within dominant understandings of time and consistency––are thoroughly scrambled by the visual tracking system’s sporadic mode of recognition and its inherent latency. As described above, it is structurally guaranteed that the AUMI’s red dot will not remain in its original place during a performance, but instead, latch onto unexpected parts of the body.
Simultaneously, the dot-to-movement response rate is not immediate. My own involvement with “the unexpected” in communal musical production and performance moulds my interpretation of what is socially (and politically) at work in both “Jayhawks” and “Animal Parade.” While participating in AUMI jam sessions I could not help but reminisce on similar experiences with the collective management of orientations/disorientations that, while depending on quite different technological structures, produced similar effects regarding performer communication.
Being a researcher steeped in the L.A. area Salsa, Latin Jazz, and Black Gospel scenes meant that I was immediately drawn to the AUMI’s most disorienting-yet-reorienting qualities. In Timba, the form of contemporary Afrocuban music that I most closely studied back in Los Angeles, disorientations and reorientations are the most prized structural moments in any composition. For example, Issac Delgado’s ensemble 1997 performance of “No Me Mires a Los Ojos” (“Don’t Look at Me In the Eyes”)– featuring now-legendary performances by Ivan “Melon” Lewis (keyboard), Alain Pérez (bass), and Andrés Cuayo (timbales)—sonically reveals the tradition’s call to disorient and reorient performers and dancers alike through collaborative improvisations:
Video Filmed by Michael Croy.
“No Me Mires a los Ojos” is riddled with moments of improvisational coalition formed rather immediately and then resolved in a return to the song’s basic structure. For listeners disciplined by Western musical training, the piece may seem to traverse several time signatures, even though it is written entirely in 4/4 time signature. Timba accomplishes an intense, percussively demanding, melodically multifaceted set of improvisations that happen all at once, with the end goal of making people dance, nodding at the principle tradition it draws its elements from: Afrocuban Rumba. Every performer that is not a horn player or a vocalist is articulating patterns specific to their instrument, played in the form of basic rhythms expected at certain sections. These patterns and their variations evolved from similar Rumba drum and bell formats and the improvisational contributions each musician is expected to integrate into their basic pattern too comes from Rumba’s long-standing tradition of formalized improvisation. The formal and the improvisational function as single communicative practice in Timba. Performers recall format from their embodied knowledge of Rumba and other pertinent influences while disrupting, animating, and transforming pre-written compositions with constant layers of improvisation.
What ultimately interests me the most about the formal registers within the improvisational tradition that is Timba, is that these seem to function, on at least one level, as premeditated terms for communal engagement. This kind of communication enables a social set of interactions that, like Jazz, grants every performer the opportunity to improvise at will, insofar as the terms of engagement are seriously considered. As with the AUMI library jam sessions, timba’s disorientations, too, seem to reorient. What is different, though, is how the AUMI’s sound bank acts in tandem with a performer’s own embodied musical knowledge as an extension of the archive available for improvisation. In Timba, the sound bank and knowledge of form are both entirely embodied, with synthesizers being the only exception.
Timba ensembles and their interpretations of traditional and non-Cuban forms, like the AUMI and its sound bank, use reliable and predictable knowledge bases to break with dominant notions of time and its coherence, only to wrangle performers back to whatever terms of communal engagement were previously decided upon. In this sense, I read the AUMI not as a solitary instrument but as a partial orchestration of sorts, with functions that enable not only an accessible musical experience but also social arrangements that rely deeply on a more responsible management of the unexpected. While the Timba ensemble is required to collaboratively instantiate the potential for disorientations, the AUMI provides an effective and generative incorporation of said potential as a default mechanism of instrumentation itself.

Image from “How do you AUMI?” at the Lawrence Public Library
As the AUMI continues on its early trajectory as a free, downloadable software designed to be accessed by performers of mixed abilities, it behooves us to listen deeply to the lessons learned by orchestral traditions older than our own. Timba does not come without its own problems of social inequity––it is often a “boy’s club,” for one––but there is much to learn about how the traditions built around its instruments have managed to centralize the value of unexpected, multilayered, and even complexly simultaneous patterns of communication. There is also something to be said about the necessity of studying the improvisational communication patterns of musical traditions that have not yet been institutionalized or misappropriated within “first world” societies. Timba teaches us that the conga alone will not speak without the support of a community that celebrates difference, the nuances of its organization, and the call to return to difference. It teaches us, in other words, to see the constant need for difference and its reorganization as a singular practice.
The work started with the AUMI’s earliest users in Poughkeepsie, New York and that involving mixed-ability ensembles in Lawrence, Kansas today is connected through the AUMI Consortium’s commitment to a kind of research aimed at listening closely and deeply to the AUMI’s improvisational potential interdisciplinarily and undisciplinarily across various sites. A tech innovation alone will not sustain the work of disrupting the longstanding, rooted forms of ableism ever-present in dominant musical production, performance, and communication, but mixed-ability performer coalitions organized around a radical interrogation of coherence and expectation may have a fighting chance. I hope the technology team never succeeds at working out all of the “discrepancies,” as these are helping us to build traditions that frame the AUMI’s mechanical propensity towards disorientation as the raw core of its democratic potential.
—
Featured Image: by Ray Mizumura-Pence at The Commons, Spooner Hall, KU, at rehearsals for “(Un)Rolling the Boulder: Improvising New Communities” performance in October 2013.
—
Caleb Lázaro Moreno is a doctoral student in the Department of American Studies at the University of Kansas. He was born in Trujillo, Peru and grew up in the Los Angeles area. Lázaro Moreno is currently writing about methodological designs for “the unexpected,” contributing thought and praxis that redistributes agency, narrative development, and social relations within academic research. He is also a multi-instrumentalist, composer, and producer, check out his Soundcloud.
—
REWIND! . . .If you liked this post, you may also dig:
Introduction to Sound, Ability, and Emergence Forum –Airek Beauchamp
Unlearning Black Sound in Black Artistry: Examining the Quiet in Solange’s A Seat At the Table — Kimberly Williams
Experiments in Agent-based Sonic Composition — Andreas Duus Pape
Recent Comments