I first heard about voice donation while listening to “Being Siri,” an experimental audio piece about Erin Anderson donating her voice to Boston-based voice donation company, VocaliD. Like a digital blood bank of sorts, VocaliD provides a platform for donating one’s voice via digital audio recordings. These recordings are used to help technicians create a custom digital voice for a voiceless individual, providing an alternative to the predominately white, male, mechanical-sounding assistive technologies used by people who cannot vocalize for themselves (think Stephen Hawking). VocaliD manufactures voices that better match a person’s race, gender, ethnicity, age, and unique personality. To me, VocaliD encapsulates the promise, complexity, and problematic nature of our current speech AI landscape and serves as an example of why we need to think critically about sound technologies, even when they appear to be wholly beneficial.
Given the extreme lack of sonic diversity in vocal assistive technologies, VocaliD provides a critically important service. But a closer look at both the rhetoric used by the organization and the material process involved in voice donation also amplifies the limits of overly simplistic, human-centric conceptions of voice. For instance, VocaliD rhetorically frames their service by persistently linking voice to humanity—to self, authenticity, individuality. Consider the following statements made by Rupal Patel, CEO and founder of VocaliD, in which she emphasizes the need for voice donation technology:
These are just a few examples from a larger discourse that reinforces the connection between voice and humanity. VocaliD’s repeated claims that their unique vocal identities humanize individuals imply that one is not fully human unless one’s voice sounds human. This rhetoric positions voiceless individuals as less than human (at least until they pay for a customized human-sounding voice).
VocaliD’s conflation of voice and humanity makes me wonder about the meaning of “human” in this context. For example, notions of humanity have been historically associated with Western whiteness—and deployed as a means of separating or distinguishing white people from Others—as Alexander Weheliye points out. Though VocaliD’s mission is to diversify manufactured voices, is a “human-sounding” voice still construed as a white voice? Does sounding human mean sounding white? Even if there is a bank of sonically diverse voices to choose from, does racial bias show up in the pacing, phrasing, or inflection caused by the vocal technology?
I am also disturbed by the rhetoric of humanity and individuality used by VocaliD because the company adopts the same rhetoric to describe the AI voices they sell to brands for media and smart products. Here’s an example of this rhetoric from the VocaliD AI website: “When you need a voice that resonates, evokes audience empathy, and sounds like you, rather than your competitors, VocaliD’s AI-powered vocal persona is the solution. Your voice — always on, where you need it when you need it.” Using similar rhetorical strategies to describe both voiceless people and products is dehumanizing. And yet, having a more diverse AI vocal mediascape, especially in terms of race, is crucially important since voice-activated machines and products are designed largely by white men who end up reinforcing the sonic color line.
Interestingly, the processes VocaliD uses to create a custom voice reveal that these voices are not, in fact, unique markers of humanity or individuality. It’s hard to find a detailed account of how VocaliD voices are made due to the company’s patents, but here are the basics: VocaliD does not transfer a donated voice directly to a voiceless person’s assistive technology. VocaliD technicians instead blend and digitally manipulate the donated voice with recordings of the noises a voiceless person can make (a laugh, a hum) to create a distinct new voice for the recipient. In other words, donated voices are skillful remixes that wouldn’t be possible without extracting vocal data and manipulating it with digital tools. Despite perpetuating narratives about voice, humanity, and authenticity, VocaliD’s creative blending of vocal material reveals that donated voices are the result of compositional processes that involve much more than people.
Further, considering VocaliD voices from a material rather than human-centric perspective amplifies something important about voices in general. All voices are composed of and grounded in an ecology. That is, voices emerge and are developed through a mixture of: (1) biological makeup (or technological makeup in the case of machines with voices); (2) specific environments and contexts (geography may determine the kind of accents humans have; AI voices have distinct sounds for their brands); (3) technologies (phones, computers, digital recorders and editors, software, and assistive technologies preserve, circulate, and amplify voices); and (4) others (humans often emulate the vocal patterns of the people they interact with most; many machine voices also sound like other machine voices). Put simply, all voices are intentionally and unintentionally composed over time—shaped by ever-changing bodily (and/or technological) states and engagements with the world. Voices are dynamic compositions by nature. Examining voice from a material standpoint shows that voices are not static markers of humanity; voices are responsive and malleable because they are the result of a complex ecology that involves much more than a “unique” human being.
However, focusing solely on the material aspects of vocality leaves out people’s lived experiences of voice. And based on online videos of VocaliD recipients—like Delaney, a seventeen-year-old with cerebral palsy—VocaliD voices seem to live up to the company’s hype. Delaney appears delighted by her new voice, stating: “I was so excited to get my own voice. I used to have a computer voice and now I sound like a girl. I like that. And I talk more.” Delaney’s teachers also discuss how her new voice completely changed her demeanor. Whereas before Delaney was reluctant to use her assistive technology to speak, her new voice gives her confidence and a stronger sense of identity. As her teacher explains in the video, “she is really engaged in groups, she wants to share her answers, she’s excited to talk with friends. It’s been really nice to see.” For Delaney, a VocaliD voice represents a newfound sense of agency.
It’s important to recognize this video is not necessarily representative of every VocaliD recipient’s experience, or even Delaney’s full experience. As Meryl Alper notes in Giving Voice, these types of news stories “portray technology as allowing individuals to ‘overcome’ their disability as an individual limitation, and are intended to be uplifting and inspirational for able-bodied audiences” (27). While we should be wary of the technological determinism in the video, observing Delaney use her VocaliD voice—and listening to the emotional responses of her mom and teachers—makes it difficult to deny that donated voices make a positive impact. For me, this video also gets at a larger truth about humans and voice: the ways we hear and understand our own voices, and the ways others interpret the sounds of our voices, matter a great deal. Voices are integral to our identities—to the ways we understand and think about ourselves and others—and the sounds of our voices have social and material consequences, as the SO! Gendered Voices Forum illustrates so clearly.
It’s worth repeating that VocaliD’s mission to diversify synthetic voices is incredibly important, especially given the restrictive vocal options available to voiceless individuals. It’s also necessary to acknowledge the company has limitations that end up reproducing the structural inequities it tries to address. As Alper observes, “In order to become a speech donor, one must have three to four hours of spare time to record their speech, access to a steady and strong Internet connection, and a quiet location in which to record” (162-63). With these obstacles to donating one’s voice in mind, it’s not surprising that all the VocaliD recipient videos I could find feature white people. Donating one’s voice is much easier for middle to upper class white people who have access to privacy, Internet, and leisure time.
This brief examination of VocaliD raises questions about what a more equitable future for vocal technologies might look/sound like. Though I don’t have the answer, I believe that to understand the fullness of voice, we can’t look at it from a single perspective. We need to account for the entire vocal ecology: the material (biological, technological, financial, etc.) conditions from which a voice emerges or is performed, and individual speakers’ understanding of their culture, race, ethnicity, gender, class, ability, sexuality, etc. An ecological approach to voice involves collaborating with people and their vocal needs and desires—something VocaliD models already. But it also involves accounting for material realities: How might we make the barriers preventing a more diverse voice ecosystem less difficult to navigate—especially for underrepresented groups? In short, we must treat voice holistically. Voices are more than people, more than technologies, more than contexts, more than sounds. Understanding voice means acknowledging the interconnectedness of these things and how that interconnectedness enables or precludes vocal possibilities.
Featured image: 366-350 You can’t shut me up, Jennifer Moo, CC BY-ND
Steph Ceraso is an associate professor of digital writing and rhetoric at the University of Virginia. Her 2018 book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, proposes an expansive approach to teaching with sound in the composition classroom. She also published a digital book in 2019 called Sound Never Tasted So Good: ‘Teaching’ Sensory Rhetorics—an exploration of writing, sound, rhetoric, and food. She is currently working on a book project that examines sonic forms of invention in various contexts.
REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
On May 5, 2018, the C-ville Weekly, a newspaper based out of Charlottesville, Virginia, published an article titled “Sex, drugs and rock ’n’ roll: new apartment complex promises at least one of those.” The headline referred to the complex being built at 600 West Main St. in Charlottesville. The complex has since been completed and studio bedrooms currently cost more than $1000 a month. As the C-ville Weekly headline shows, the developers were using the term and connotations of “rock ’n’ roll” to sell exclusive – and in many ways unaffordable – housing.
After reading this headline, I began to develop an idea for a summer course at my institution, the University of Virginia (UVA). I ultimately titled that course “Black Music and Corporate America” which I offered online during the summer of 2021 (syllabus available for download via the link above). Although the course discussed varied content – from the multi-ethnic, multi-racial, and multi-gendered histories of rock and roll to the endorsement of conspicuous forms of consumption in hip hop – I wanted to spend one unit focusing on the interrelationship between music, corporate America, and gentrification. I strove to solidify this connection by assigning two related articles. The first article, by geographer and sociologist Brandi Thomson Summers, argues that black residents in Washington D.C. adopt go-go music as a form of reclamation aesthetics to combat their city’s increasingly rampant gentrification. In the second article, ethnomusicologist Allie Martin conducts a soundwalk of D.C.’s Shaw District to forefront the experience of a black woman in the city and help displace white hearing as the default standard of interpreting sound (see Sounding Out!’s Soundwalking While POC series from Fall 2019). These two articles served as a foundation for one of the assignments the students had to complete in class: conducting a soundwalk of their own in which they had to walk around a field site of their choosing and think critically about the sounds they were hearing.
Throughout the summer sessions, students completed three main assignments related to the course topic. They had to think about marketing themselves and thus wrote a cover letter for a job or internship they were interested in pursuing in the future. We also, as a class, sent a suggestion to literary scholar John Patrick Leary, who has created a list of “keywords of capitalism:” buzzwords that get adopted in corporate lingo; we suggested “rockstar” as a term and offered him a brief explanation why:
Students also had to conduct a soundwalk. I asked them to model it after Martin’s and to also take into consideration Summers’ arguments about gentrification, white policing of black sound, and a community’s response to attempts to silence their music and culture.
The soundwalks I received merit sharing with readers of Sounding Out for three primary reasons: 1) The assignment benefited from the online format, especially since students could conduct soundwalks in Charlottesville as well as in their homes across the country. 2) the students made compelling arguments that deserve recognition. 2) the students brought up issues that teachers interested in assigning soundwalks in the future might want to preemptively address.
Students who walked around Charlottesville focused mostly on The Corner, the portion of the city where most of UVA’s student body eats, shops, and drinks. As one student noted, during the regular semester, hundreds of students populating The Corner on any given day during the semester can silence out – literally – the concerns of the homeless and the panhandlers who make the area their home. However, over the summer, Charlottesville’s Corner becomes significantly less populated and, as this student noted, much more silent. As a result of this silence, pedestrians might be much more attuned to Charlottesville’s rampant inequality. This student, over the course of their summer soundwalk on The Corner, came to a radical conclusion: while communities might need moratoriums on evictions, or moratoriums on construction, maybe Charlottesville needs a moratorium on student noise as well.
In addition to focusing on inequality, many students’ soundwalks pointed out discrepancies between what they saw and what they heard while on their soundwalks. Another student writing about The Corner noted how, as a transfer student, the music that they heard emanating from a barbershop helped make them feel at home in Charlottesville. Businesses on The Corner have historically not been entirely welcoming to people of color. Additionally, most pedestrians and patrons of The Corner are white. However, this student remarked how comfortable they felt on The Corner because they could hear one of their favorite artists, Moneybagg Yo, playing from the sound system of the barbershop they were going to visit. Long before they could visually see the business, the soundscape let this student know they were welcome. In this way, this barbershop helped create a sense of community in a similar way that the broadcasting of go-go music from Shaw’s many businesses helps create in Washington D.C.
Another student focused specifically on the contradictions between the activism they “saw” demonstrated in their upper-class Boston suburb and the activism they “heard” while walking around their neighborhood. This student noted that residents of their neighborhood strove to create an inclusive atmosphere by putting up “Black Lives Matters” and “Immigrants Welcome” yard signs. However, they also cited Jennifer Lynn Stoever’s work – who we read in class – and noted the presence of what Stoever calls the “sonic color line.” As this students’ own field recordings of their neighborhood illuminated, most residents of this neighborhood valued silence. Harlemites during the 1940s and 1950s, as Stoever writes, certainly appreciated restful nights, but her scholarship also demonstrates how dominant narratives constructed black communities as “noisy,” “chaotic,” and “dangerous,” and white ones as “silent,” “efficient,” and “disciplined.” Although residents in this Boston suburb think of themselves as progressive and demonstrate their liberalism through visual signifiers such as yard signs, this student concluded that they still live in a community that privileges certain (silent) soundscapes. In doing so, such communities continue to perpetuate the sonic color line.
Admittedly, several students living in America’s suburbs struggled to conceive of the sounds they heard as worthy of discussion. For instance, the sounds of cars made frequent appearances in their writing but were often dismissed as inconsequential. Instead, students lamented that they were not experiencing a vibrant public sphere that resembled the setting of Spike Lee’s 1989 film, Do the Right Thing (a film we watched together in class), as if that representation wasn’t a very particular historicized and localized representation. On an individual basis, I tried to get students to think more critically about the sounds of cars in their neighborhood. We read about the role of automobile in the development of G-Funk during the early 1990s as well as the death of Jordan Davis, who was murdered in his car for playing rap too loudly. However, neither article resonated with students’ experience on their soundwalks since they were simply hearing cars passing by their houses or driving down the street. Most of the time, they could not tell what type of music was being listened to at all inside the car nor could they hear it emanate onto the street.
Therefore, teachers, depending on the living conditions of their students, might want to preemptively include discussions of car culture within American society. After all, more than go-go music broadcasted from storefronts, or second line parades, or music playing from boomboxes, or the noise of nature, (my) students typically hear cars in their day-to-day life. As a result, teachers assigning soundwalks may want to talk about the role of highway construction and the automobile industry on suburbanization and white flight. Discussions of automobiles within the context of environmental racism might also be useful for students to consider. Steph Ceraso’s Sounding Composition also discusses the immense time and energy corporations have devoted to car sounds and soundscapes within cars, buffering occupants from car noise as well as that of the neighborhoods outside.
In addition, I found that students need a more robust historical understanding of suburbanization in the United States, particularly alongside an understanding of their own racial and ethnic histories. Some students living African American suburbs could have benefited from some contextualization about when and how they came to be. Talking about suburbanization in general, the development of White suburban liberalism in the 1970s and 1980s would have helped the student living in a Boston suburb make more sense of the politics of their neighborhood. Karen Tongson’s Relocations also provides context for shifts in America’s suburban landscape after sweeping changes in immigration law in 1965, as well as a rethinking of expressions of sexuality in the suburbs. These are just some topics I wish I had focused on more to help prepare my students for their soundwalks.
Future teachers may feel inclined to refer to the conclusions my students came to, as well as the literature I wish I had included in course, as they think about assigning soundwalks in their own classes. Both my students and I appreciated the soundwalk assignment and its invitations to listen differently. Teaching soundwalks in a course focusing on “Black music and marketing strategy” prompted my own necessary meditation as a non-Black scholar working in this field. Guided by Loren Kajikawa’s new research on “Music, Hip Hop and the Challenge of Significant Difference” that examines how the popularity of courses on black music help subsidize a university’s classical music offerings, I want to incorporate future discussions of Black music as sonic diversity marketing in contemporary higher ed, both at the microlevel of scholarship and the macro- institutional level, which remains far from equitable despite ongoing challenges to its status quo. For students, the soundwalks–in their words–allowed them to learn about themselves and think differently about the area in which they live. They also become more attuned to their surroundings–questioning what makes a neighborhood and for whom?–and how different cultures use their voices where they live, necessary skills for our moment that will help us envision a world beyond it.
Featured Image: Wall Mural right next to Bowerbird Bakeshop in Charlottesville, VA, image by Tom Mills, (CC BY-SA 2.0)
Rami Toubia Stucky is a PhD candidate at the University of Virginia and scholar of the music of the African diaspora, music of the Americas, commercial culture, intercultural exchange, and music and migration. Sometimes he composes/arranges jazz music and plays drums. He is currently writing a dissertation on the arrival of Brazilian bossa nova to the United States during the 1960s. He runs a personal and professional website dedicated mostly to talking about the songs his sister likes.
REWIND!…If you liked this post, you may also dig all this good stuff about sound studies pedagogy! Good luck with Fall semester, folks!:
Deejaying her Listening: Learning through Life Stories of Human Rights Violations– Emmanuelle Sonntag and Bronwen Low
Audio Culture Studies: Scaffolding a Sequence of Assignments– Jentery Sayers
“Toward A Civically Engaged Sound Studies, or ReSounding Binghamton”–Jennifer Lynn Stoever
SO! Podcast #79: Behind the Podcast: deconstructing scenes from AFRI0550, African American Health Activism – Nic John Ramos and Laura Garbes
Listening to #Occupy in the Classroom–D. Travers Scott
Sounding Out! Podcast #13: Sounding Shakespeare in S(e)oul– Brooke Carlson
A Listening Mind: Sound Learning in a Literature Classroom–Nicole Brittingham Furlonge
My Voice, or On Not Staying Quiet–Kaitlyn Liu
If You Can Hear My Voice: A Beginner’s Guide to Teaching–Caroline Pinkston
Mukbang Cooks, Chews, and Heals – David Lee
SO! Podcast #80: Refugee Realities Miniseries–Steph Ceraso