I first heard about voice donation while listening to “Being Siri,” an experimental audio piece about Erin Anderson donating her voice to Boston-based voice donation company, VocaliD. Like a digital blood bank of sorts, VocaliD provides a platform for donating one’s voice via digital audio recordings. These recordings are used to help technicians create a custom digital voice for a voiceless individual, providing an alternative to the predominately white, male, mechanical-sounding assistive technologies used by people who cannot vocalize for themselves (think Stephen Hawking). VocaliD manufactures voices that better match a person’s race, gender, ethnicity, age, and unique personality. To me, VocaliD encapsulates the promise, complexity, and problematic nature of our current speech AI landscape and serves as an example of why we need to think critically about sound technologies, even when they appear to be wholly beneficial.
Given the extreme lack of sonic diversity in vocal assistive technologies, VocaliD provides a critically important service. But a closer look at both the rhetoric used by the organization and the material process involved in voice donation also amplifies the limits of overly simplistic, human-centric conceptions of voice. For instance, VocaliD rhetorically frames their service by persistently linking voice to humanity—to self, authenticity, individuality. Consider the following statements made by Rupal Patel, CEO and founder of VocaliD, in which she emphasizes the need for voice donation technology:
These are just a few examples from a larger discourse that reinforces the connection between voice and humanity. VocaliD’s repeated claims that their unique vocal identities humanize individuals imply that one is not fully human unless one’s voice sounds human. This rhetoric positions voiceless individuals as less than human (at least until they pay for a customized human-sounding voice).
VocaliD’s conflation of voice and humanity makes me wonder about the meaning of “human” in this context. For example, notions of humanity have been historically associated with Western whiteness—and deployed as a means of separating or distinguishing white people from Others—as Alexander Weheliye points out. Though VocaliD’s mission is to diversify manufactured voices, is a “human-sounding” voice still construed as a white voice? Does sounding human mean sounding white? Even if there is a bank of sonically diverse voices to choose from, does racial bias show up in the pacing, phrasing, or inflection caused by the vocal technology?
I am also disturbed by the rhetoric of humanity and individuality used by VocaliD because the company adopts the same rhetoric to describe the AI voices they sell to brands for media and smart products. Here’s an example of this rhetoric from the VocaliD AI website: “When you need a voice that resonates, evokes audience empathy, and sounds like you, rather than your competitors, VocaliD’s AI-powered vocal persona is the solution. Your voice — always on, where you need it when you need it.” Using similar rhetorical strategies to describe both voiceless people and products is dehumanizing. And yet, having a more diverse AI vocal mediascape, especially in terms of race, is crucially important since voice-activated machines and products are designed largely by white men who end up reinforcing the sonic color line.
Interestingly, the processes VocaliD uses to create a custom voice reveal that these voices are not, in fact, unique markers of humanity or individuality. It’s hard to find a detailed account of how VocaliD voices are made due to the company’s patents, but here are the basics: VocaliD does not transfer a donated voice directly to a voiceless person’s assistive technology. VocaliD technicians instead blend and digitally manipulate the donated voice with recordings of the noises a voiceless person can make (a laugh, a hum) to create a distinct new voice for the recipient. In other words, donated voices are skillful remixes that wouldn’t be possible without extracting vocal data and manipulating it with digital tools. Despite perpetuating narratives about voice, humanity, and authenticity, VocaliD’s creative blending of vocal material reveals that donated voices are the result of compositional processes that involve much more than people.
Further, considering VocaliD voices from a material rather than human-centric perspective amplifies something important about voices in general. All voices are composed of and grounded in an ecology. That is, voices emerge and are developed through a mixture of: (1) biological makeup (or technological makeup in the case of machines with voices); (2) specific environments and contexts (geography may determine the kind of accents humans have; AI voices have distinct sounds for their brands); (3) technologies (phones, computers, digital recorders and editors, software, and assistive technologies preserve, circulate, and amplify voices); and (4) others (humans often emulate the vocal patterns of the people they interact with most; many machine voices also sound like other machine voices). Put simply, all voices are intentionally and unintentionally composed over time—shaped by ever-changing bodily (and/or technological) states and engagements with the world. Voices are dynamic compositions by nature. Examining voice from a material standpoint shows that voices are not static markers of humanity; voices are responsive and malleable because they are the result of a complex ecology that involves much more than a “unique” human being.
However, focusing solely on the material aspects of vocality leaves out people’s lived experiences of voice. And based on online videos of VocaliD recipients—like Delaney, a seventeen-year-old with cerebral palsy—VocaliD voices seem to live up to the company’s hype. Delaney appears delighted by her new voice, stating: “I was so excited to get my own voice. I used to have a computer voice and now I sound like a girl. I like that. And I talk more.” Delaney’s teachers also discuss how her new voice completely changed her demeanor. Whereas before Delaney was reluctant to use her assistive technology to speak, her new voice gives her confidence and a stronger sense of identity. As her teacher explains in the video, “she is really engaged in groups, she wants to share her answers, she’s excited to talk with friends. It’s been really nice to see.” For Delaney, a VocaliD voice represents a newfound sense of agency.
It’s important to recognize this video is not necessarily representative of every VocaliD recipient’s experience, or even Delaney’s full experience. As Meryl Alper notes in Giving Voice, these types of news stories “portray technology as allowing individuals to ‘overcome’ their disability as an individual limitation, and are intended to be uplifting and inspirational for able-bodied audiences” (27). While we should be wary of the technological determinism in the video, observing Delaney use her VocaliD voice—and listening to the emotional responses of her mom and teachers—makes it difficult to deny that donated voices make a positive impact. For me, this video also gets at a larger truth about humans and voice: the ways we hear and understand our own voices, and the ways others interpret the sounds of our voices, matter a great deal. Voices are integral to our identities—to the ways we understand and think about ourselves and others—and the sounds of our voices have social and material consequences, as the SO! Gendered Voices Forum illustrates so clearly.
It’s worth repeating that VocaliD’s mission to diversify synthetic voices is incredibly important, especially given the restrictive vocal options available to voiceless individuals. It’s also necessary to acknowledge the company has limitations that end up reproducing the structural inequities it tries to address. As Alper observes, “In order to become a speech donor, one must have three to four hours of spare time to record their speech, access to a steady and strong Internet connection, and a quiet location in which to record” (162-63). With these obstacles to donating one’s voice in mind, it’s not surprising that all the VocaliD recipient videos I could find feature white people. Donating one’s voice is much easier for middle to upper class white people who have access to privacy, Internet, and leisure time.
This brief examination of VocaliD raises questions about what a more equitable future for vocal technologies might look/sound like. Though I don’t have the answer, I believe that to understand the fullness of voice, we can’t look at it from a single perspective. We need to account for the entire vocal ecology: the material (biological, technological, financial, etc.) conditions from which a voice emerges or is performed, and individual speakers’ understanding of their culture, race, ethnicity, gender, class, ability, sexuality, etc. An ecological approach to voice involves collaborating with people and their vocal needs and desires—something VocaliD models already. But it also involves accounting for material realities: How might we make the barriers preventing a more diverse voice ecosystem less difficult to navigate—especially for underrepresented groups? In short, we must treat voice holistically. Voices are more than people, more than technologies, more than contexts, more than sounds. Understanding voice means acknowledging the interconnectedness of these things and how that interconnectedness enables or precludes vocal possibilities.
Featured image: 366-350 You can’t shut me up, Jennifer Moo, CC BY-ND
Steph Ceraso is an associate professor of digital writing and rhetoric at the University of Virginia. Her 2018 book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, proposes an expansive approach to teaching with sound in the composition classroom. She also published a digital book in 2019 called Sound Never Tasted So Good: ‘Teaching’ Sensory Rhetorics—an exploration of writing, sound, rhetoric, and food. She is currently working on a book project that examines sonic forms of invention in various contexts.
REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
For a number of semesters, I invited composition students to explore the idea of using the mixtape as a lens for envisioning a writing assignment about themselves. Initially called “The Mixtape Project,” this auto-ethnographical assignment employed philosophies from various scholars, but focused on Jared Ball and his concept of the mixtape as “emancipatory journalism.” In I Mix What I Like!: A Mixtape Manifesto, Ball pushed readers to imagine the mixtape as a counter-systematic soundbombing, circumventing elements of traditional record industry copyright practices (2011).
Essentially, a DJ could use a myriad of songs from different artists and labels to curate a mixtape with a desired theme and overarching message, then distribute the mixtape as a “for promotional use only” artifact. Throughout the 1980s, but predominantly in the 1990s and early 2000s, many DJs used mixtapes as the medium to promote their DJ brands and generate income. It wasn’t long before labels began to give hip-hop DJs record deals to release “album-style” mixtapes where the DJs record original content from artists made specifically for the DJ album (see DJ Clue, Funkmaster Flex, Tony Touch). This idea evolved into producer-based compilation albums, best depicted today by global icon DJ Khalid. Rappers also hopped on the mixtape wave, using the medium to jump-start their careers, create a “street buzz” around their music, and ultimately gauge the success of certain songs to craft and promote upcoming albums.
The assignment revolved around mixtape framework in the earlier portion of my teaching career. Most recently, I began to realize as my students evolve (and I simultaneously age), that the “mixtape” – a sonic artifact distributed on cassette tape or CD – is becoming more remote to students. This thinking led to revising the assignment with a more contemporary twist. Thus, “The Playlist Project” was born: the first in a set of four major writing projects in a first-year writing classroom. The ultimate goal of the assignment was to immediately disrupt students’ relationships with academic writing, and to help them (re)envision the ways they embrace some of the cultural capital they value in college classrooms. Be clear, this was a particular type of mental break for students, a shift that was welcomed yet also uncomfortable for them.
“I Get It How I Live It”: Framing and Foregrounding the Assignment Set-Up
The course started with readings on plagiarism, intertextuality, and the hip-hop DJ’s use of sampling, curating, and storytelling. Next were readings by hip-hop artists describing their creative process and detailing their artistic choices sonically. These early readings helped pivot students from their stereotypical notions of what college writing courses – and writing assignments – looked like, and how they could enter scholarly discourse around composing. This conversation was foregrounded in students’ knowledge that they bring with them into the new academic space in the college classroom. My goal was to really focus on student-centered learning and culturally relevant pedagogy; ideally, if you are immersed in hip-hop music and culture, I want you to share that knowledge with the class. This sharing begins to create a community of thinking peers instead of a classroom with an English professor and a bunch of students who have to take the course “cuz it’s required in the Gen Ed, so I can’t take anything else ‘til I pass this!”
My research is entrenched in both hip-hop pedagogy and culture, specifically looking at the DJ as 21st century new media reader and writer. I liken my role as instructor to that of the DJ: a tastemaker and curator for the ways we understand sonic sources we know, and couple them with new and necessary soundbites that become critical to the cutting edge of the learning we need. I’ve engaged in the craft of DJing for more than half of my life, and use DJ practices as pedagogical strategies in my classroom environments.
The outcome of this curatorial moment was “the Playlist Project.” Students were asked to create their own playlists, which served as mixtapes that either “described the writer as a person” or “depicted the soundtrack to the writer’s perfect day.” This assignment was due during Week 6 of a 16-week semester, and was the first major writing assignment within the course. The assignment called for two specific parts: an actual playlist of the songs and an essay which served as a meta-text, describing not only the songs, but also the reasons why the songs were chosen and sequenced in a specific order. As an example, the guiding text we used was a DJ mixtape I created called “Heavy Airplay, All Day.”
“Heavy Airplay, All Day with No Chorus”: DJ Mixtape by Todd Craig
My playlist was a DJ-crafted tribute to a family friend who passed away in the summer of 2017: Albert “Prodigy” Johnson, Jr. Hearing the news of his untimely death reverberated through my psyche on that warm June afternoon; I remember meeting Prodigy when I was 15 years old. Many avid hip-hop listeners not only know Prodigy as one of the signature vocalists of the 1990s New York hip-hop sound, but also as one of the premier lyricists responsible for a shift in sonic content from emcees in New York and globally. His voice is one of the most sampled in hip-hop music.
One of the most anticipated moments of the mid 1990’s was the release of Prodigy’s first solo album, H.N.I.C. P was already shaking the industry with his lethal and bone-chilling visuals in his verses. But everyone knew he was on his way to dominance upon hearing the single “Keep it Thoro.” On this Alchemist-produced record, P basically broke industry rules in regards to typical hip-hop song construction; his verses were longer than the traditional 16-bar count, and the song had no chorus.
He returned to hip-hop basics: hard-hitting rhymes with undeniable visuals served atop a sonic landscape that kept everyone’s head nodding. P ends the song with the classic line “and I don’t care about what you sold/ that shit is trash/ bang this – cuz I guarantee that you bought it/ heavy airplay all day with no chorus/ I keep it thoro” (Prodigy 2000).
It was only right for me to create a tribute mixtape for Prodigy. And it felt right to start the Fall 2017 semester with the Playlist Project that used a shared text that celebrated and honored his memory. It highlighted the soundtrack to my perfect day: having my friend back to rewind all the memories that come with every song.
“I Got a New Flex and I Think I Like It”: (Re)inventing Mixtape Sensibilities in the Comp Classroom
The Playlist Project was aimed at achieving three different outcomes. The first goal was to invite students to use audio sources to envision a soundscape that explains a thread of logic. These sonic sources would hold as much value in our academic space as text-based sources, and would allow them to (re)envision what “evidence-based academic writing” looks like. Thus, students could utilize their own cultural capital to negotiate sound sources of their choosing.
The second was to get students to use DJ framework to think about sorting, sequencing and organization in writing. In our class discussions, one of the critical objectives was to get students to understand the sequencing of divergent sound sources could drastically alter the story one is trying to tell. Overall aspects of mood, tone, and pacing all become critical components of how a message is expressed in writing, but it becomes even more evident when thinking about the sonic sources used by a DJ. Each song – a source in and of itself – is a piece of a puzzle that constructs a picture and tells a story. Starting with one source can create a completely different effect if it is reconfigured to sit in the middle or the end. Explaining these sonic choices in text-based writing would be the second step in the assignment.
Finally, students would engage in editing by joining both sound and text based on a theme they have selected. Again, sequencing becomes a critical DJ tool translated into the comp classroom. Using this pedagogical strategy echoes the ideas of using DJ techniques such as “blends” and “drops” as viable teaching tools (see Jennings and Petchauer 2017). Students would need to critically think through an important question: in creating the playlist, how does one manipulate and (re)configure sound to create a sonic landscape that “writes” its own unique story?
“But Does It Go In the Club?”: Outcomes and Initial Findings of The Playlist Project
The first iteration of the Playlist Project bore mixed results. Students found it difficult to think of this project as one whole assignment consisting of three different parts. Instead, they envisioned each of the three different pieces as isolated assignments. So the playlist was one part of the assignment. They picked the songs they liked, however ordering and sequencing to convey a logical theme or argument fell from the forefront of their composing. The essay then became its own piece divorced from the organic creation of the playlist. Thus, students weren’t “engaged in telling the story of the playlist.” Instead, students were making a playlist, then summarizing why their playlists contained certain songs.
For students who were more successful integrating the elements of the assignment, we were able to have rich and fruitful classroom conversations about both selection and sequencing. For example, one student chose the theme of “the Soundtrack to the Perfect Day.” Within that theme, the student chose the song “XO TOUR Llif3” by Lil Uzi Vert.
In the song’s hook, he croons “push me to the edge/ all my friends are dead/ push me to the edge/ all my friends are dead” (Vert 2017). When this song came up in class discussion, we were able to have a formative conversation around the idea that a perfect day entailed all of someone’s friends being “dead.” This also sparked a conversation about the double meaning of the quote; it didn’t stem from traditional print-based sources, but instead arose from a student-generated idea based in the cultural capital of the classroom community. In this moment, I was able to learn more from students about the meteoric rise in relevance of both the artist and the song which seemed to depict an extreme darkness.
“Big Big Tings a Gwaan”: Future Tweaks and Goals for The Playlist Project
Moving forward with this assignment, I have considered breaking the assignment up into three pieces for more introductory composition courses: constructing the playlist, sequencing the playlist, and writing the meta-text. In this configuration, the meta-text would truly become the afterthought (instead of the forethought) of the sonic creation. As well, more in-depth soundwriting could emanate from the playlist construction, manipulation, (re)sequencing and editing. I also plan to use the assignment with a more advanced-level composition course to gauge if the assignment unfolds differently. Using an upper-level course to attain the trajectory of the assignment may be helpful in walking backwards to calibrate the assignment for students in introductory-level classes.
Another objective will be to move away from just a “playlist” and back into a “digital mixtape” format, where the playlist songs and sequencing become the fodder for a one-track, “one-take” DJ-inspired mixtape. While students don’t have to be DJs, creating a singular sonic moment digitally may imbed students in marrying the idea of soundwriting to depicting that sonic work in a meta-text. This work may also engage students in constructing sonic meta-texts, thereby submersing themselves in soundwriting practices. This work can be done in Audacity, GarageBand and any other software students are familiar with and comfortable using.
Featured Image: By Flickr User Gemma Zoey (CC BY-NC-ND 2.0)
Dr. Todd Craig is a native of Queens, New York: a product of Ravenswood and Queensbridge Houses in Long Island City. He is a writer, educator and DJ whose career meshes his love of writing, teaching and music. Craig’s research examines the hip-hop DJ as twenty-first century new media reader and writer, and investigates the modes and practices of the DJ as creating the discursive elements of DJ rhetoric and literacy. Craig’s publications include the multimodal novel tor’cha, a short story in Staten Island Noir and essays in textbooks and scholarly journals including Across Cultures: A Reader for Writers, Fiction International, Radical Teacher and Modern Language Studies. He was guest editor of Changing English: Studies in Culture and Education for the special issue “Straight Outta English” (2017). Craig is currently working on his full-length manuscript entitled “K for the Way”: DJ Literacy and Rhetoric for Comp 2.0 and Beyond. Dr. Craig has taught English Composition within the City University of New York for over fifteen years. Presently, Craig is an Associate Professor of English at Medgar Evers College, where he serves as the Composition Coordinator and City University of New York Writing Discipline Council co-chair.
REWIND!…If you liked this post, you may also dig:
Deejaying her Listening: Learning through Life Stories of Human Rights Violations– Emmanuelle Sonntag and Bronwen Low
Audio Culture Studies: Scaffolding a Sequence of Assignments– Jentery Sayers