I first heard about voice donation while listening to “Being Siri,” an experimental audio piece about Erin Anderson donating her voice to Boston-based voice donation company, VocaliD. Like a digital blood bank of sorts, VocaliD provides a platform for donating one’s voice via digital audio recordings. These recordings are used to help technicians create a custom digital voice for a voiceless individual, providing an alternative to the predominately white, male, mechanical-sounding assistive technologies used by people who cannot vocalize for themselves (think Stephen Hawking). VocaliD manufactures voices that better match a person’s race, gender, ethnicity, age, and unique personality. To me, VocaliD encapsulates the promise, complexity, and problematic nature of our current speech AI landscape and serves as an example of why we need to think critically about sound technologies, even when they appear to be wholly beneficial.
Given the extreme lack of sonic diversity in vocal assistive technologies, VocaliD provides a critically important service. But a closer look at both the rhetoric used by the organization and the material process involved in voice donation also amplifies the limits of overly simplistic, human-centric conceptions of voice. For instance, VocaliD rhetorically frames their service by persistently linking voice to humanity—to self, authenticity, individuality. Consider the following statements made by Rupal Patel, CEO and founder of VocaliD, in which she emphasizes the need for voice donation technology:
These are just a few examples from a larger discourse that reinforces the connection between voice and humanity. VocaliD’s repeated claims that their unique vocal identities humanize individuals imply that one is not fully human unless one’s voice sounds human. This rhetoric positions voiceless individuals as less than human (at least until they pay for a customized human-sounding voice).
VocaliD’s conflation of voice and humanity makes me wonder about the meaning of “human” in this context. For example, notions of humanity have been historically associated with Western whiteness—and deployed as a means of separating or distinguishing white people from Others—as Alexander Weheliye points out. Though VocaliD’s mission is to diversify manufactured voices, is a “human-sounding” voice still construed as a white voice? Does sounding human mean sounding white? Even if there is a bank of sonically diverse voices to choose from, does racial bias show up in the pacing, phrasing, or inflection caused by the vocal technology?
I am also disturbed by the rhetoric of humanity and individuality used by VocaliD because the company adopts the same rhetoric to describe the AI voices they sell to brands for media and smart products. Here’s an example of this rhetoric from the VocaliD AI website: “When you need a voice that resonates, evokes audience empathy, and sounds like you, rather than your competitors, VocaliD’s AI-powered vocal persona is the solution. Your voice — always on, where you need it when you need it.” Using similar rhetorical strategies to describe both voiceless people and products is dehumanizing. And yet, having a more diverse AI vocal mediascape, especially in terms of race, is crucially important since voice-activated machines and products are designed largely by white men who end up reinforcing the sonic color line.
Interestingly, the processes VocaliD uses to create a custom voice reveal that these voices are not, in fact, unique markers of humanity or individuality. It’s hard to find a detailed account of how VocaliD voices are made due to the company’s patents, but here are the basics: VocaliD does not transfer a donated voice directly to a voiceless person’s assistive technology. VocaliD technicians instead blend and digitally manipulate the donated voice with recordings of the noises a voiceless person can make (a laugh, a hum) to create a distinct new voice for the recipient. In other words, donated voices are skillful remixes that wouldn’t be possible without extracting vocal data and manipulating it with digital tools. Despite perpetuating narratives about voice, humanity, and authenticity, VocaliD’s creative blending of vocal material reveals that donated voices are the result of compositional processes that involve much more than people.
Further, considering VocaliD voices from a material rather than human-centric perspective amplifies something important about voices in general. All voices are composed of and grounded in an ecology. That is, voices emerge and are developed through a mixture of: (1) biological makeup (or technological makeup in the case of machines with voices); (2) specific environments and contexts (geography may determine the kind of accents humans have; AI voices have distinct sounds for their brands); (3) technologies (phones, computers, digital recorders and editors, software, and assistive technologies preserve, circulate, and amplify voices); and (4) others (humans often emulate the vocal patterns of the people they interact with most; many machine voices also sound like other machine voices). Put simply, all voices are intentionally and unintentionally composed over time—shaped by ever-changing bodily (and/or technological) states and engagements with the world. Voices are dynamic compositions by nature. Examining voice from a material standpoint shows that voices are not static markers of humanity; voices are responsive and malleable because they are the result of a complex ecology that involves much more than a “unique” human being.
However, focusing solely on the material aspects of vocality leaves out people’s lived experiences of voice. And based on online videos of VocaliD recipients—like Delaney, a seventeen-year-old with cerebral palsy—VocaliD voices seem to live up to the company’s hype. Delaney appears delighted by her new voice, stating: “I was so excited to get my own voice. I used to have a computer voice and now I sound like a girl. I like that. And I talk more.” Delaney’s teachers also discuss how her new voice completely changed her demeanor. Whereas before Delaney was reluctant to use her assistive technology to speak, her new voice gives her confidence and a stronger sense of identity. As her teacher explains in the video, “she is really engaged in groups, she wants to share her answers, she’s excited to talk with friends. It’s been really nice to see.” For Delaney, a VocaliD voice represents a newfound sense of agency.
It’s important to recognize this video is not necessarily representative of every VocaliD recipient’s experience, or even Delaney’s full experience. As Meryl Alper notes in Giving Voice, these types of news stories “portray technology as allowing individuals to ‘overcome’ their disability as an individual limitation, and are intended to be uplifting and inspirational for able-bodied audiences” (27). While we should be wary of the technological determinism in the video, observing Delaney use her VocaliD voice—and listening to the emotional responses of her mom and teachers—makes it difficult to deny that donated voices make a positive impact. For me, this video also gets at a larger truth about humans and voice: the ways we hear and understand our own voices, and the ways others interpret the sounds of our voices, matter a great deal. Voices are integral to our identities—to the ways we understand and think about ourselves and others—and the sounds of our voices have social and material consequences, as the SO! Gendered Voices Forum illustrates so clearly.
It’s worth repeating that VocaliD’s mission to diversify synthetic voices is incredibly important, especially given the restrictive vocal options available to voiceless individuals. It’s also necessary to acknowledge the company has limitations that end up reproducing the structural inequities it tries to address. As Alper observes, “In order to become a speech donor, one must have three to four hours of spare time to record their speech, access to a steady and strong Internet connection, and a quiet location in which to record” (162-63). With these obstacles to donating one’s voice in mind, it’s not surprising that all the VocaliD recipient videos I could find feature white people. Donating one’s voice is much easier for middle to upper class white people who have access to privacy, Internet, and leisure time.
This brief examination of VocaliD raises questions about what a more equitable future for vocal technologies might look/sound like. Though I don’t have the answer, I believe that to understand the fullness of voice, we can’t look at it from a single perspective. We need to account for the entire vocal ecology: the material (biological, technological, financial, etc.) conditions from which a voice emerges or is performed, and individual speakers’ understanding of their culture, race, ethnicity, gender, class, ability, sexuality, etc. An ecological approach to voice involves collaborating with people and their vocal needs and desires—something VocaliD models already. But it also involves accounting for material realities: How might we make the barriers preventing a more diverse voice ecosystem less difficult to navigate—especially for underrepresented groups? In short, we must treat voice holistically. Voices are more than people, more than technologies, more than contexts, more than sounds. Understanding voice means acknowledging the interconnectedness of these things and how that interconnectedness enables or precludes vocal possibilities.
Featured image: 366-350 You can’t shut me up, Jennifer Moo, CC BY-ND
Steph Ceraso is an associate professor of digital writing and rhetoric at the University of Virginia. Her 2018 book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, proposes an expansive approach to teaching with sound in the composition classroom. She also published a digital book in 2019 called Sound Never Tasted So Good: ‘Teaching’ Sensory Rhetorics—an exploration of writing, sound, rhetoric, and food. She is currently working on a book project that examines sonic forms of invention in various contexts.
REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
Eurovision—that televisual song pageant where pop, camp, and geopolitics annually collide—started last week. This year’s competition is hosted in Tel Aviv, and continues a recent trend in the competition in which geopolitical controversy threatens to overshadow pop spectacle. Activists accuse the Israeli government of exploiting Eurovision as part of a longstanding government PR strategy of “pinkwashing”: championing Israel as a bastion of LGBT+ tolerance in order to muddle perceptions of its violent and dehumanizing policies towards Palestinians. The BDS movement mobilized a campaign to boycott Eurovision. Reigning Eurovision champion Netta Barzilai, echoing many pro-Israel voices (as well as celebrities concerned about “subverting the spirit of the contest”), referred to the boycott efforts as “spreading darkness.”
While this year’s competition opened already mired in contention, I’m going to listen back to the controversial winning song of the 2016 contest, whose media frenzy peaked in its aftermath. That year’s champion, a pop singer of Crimean Tatar heritage who goes by the mononym Jamala, represented Ukraine with a song called “1944.” Just two years before, Crimea had been annexed from Ukraine by Russia following a dubious referendum. Some Crimean Tatars—the predominantly Sunni-Muslim Turkic-language minority group of Crimea—fled to mainland Ukraine following the Russian annexation, viewing the Ukrainian state as the lesser threat; many of those that stayed continue to endure a deteriorating human rights climate (though there are some Crimean Tatars who have bought into—and who reap benefits from—the new Russian administration of the peninsula.)
Jamala’s very presence in the contest inevitably evoked the hot geopolitics of the moment. Her victory angered many Russians, and the subject of Eurovision became fodder for conspiracy theories as well as a target of disinformation campaigns waged online and in Russian-influenced media in Ukraine. In much of the Western European and North American media, the song was breathlessly interpreted as an assertion of indigenous rights and a rebuke to the perceived cultural genocide enacted against Crimean Tatars by Russian state power.
In the wake of her victory, many commentators described Jamala as giving voice not only to the repressed group of Crimean Tatar indigenes living in the Russian-annexed territory of Crimea, but to threatened indigenous populations around the world (for better or worse). But indeed, it was not only her metaphorical voice but the sound of vocal anguish that intensified the song’s effectiveness in the contest and made it relevant well beyond the specific geopolitical bog shared by Crimean Tatars, Ukrainians, and Russians. Specifically, the timbre, breath, and dynamic force of Jamala’s voice communicated this anguish—particularly during the virtuosic non-lexical—wordless—bridge of the song. Despite her expertly controlled vocal performance during the dramatic bridge, Jamala’s voice muddies the boundaries of singing and crying, of wailing from despair and yelling in defiant anger. To pilfer from J.L. Austin’s famous formulation, what made Jamala’s performative utterance felicitous to some and infelicitous to others was as much the sound of her voice as the words that she uttered. Put simply, on the bridge of “1944,” Jamala offers a lesson in how to do things with sound.
Some background: the world’s longest-running televised spectacle of song competition, the Eurovision Song Contest began in 1956 with the peaceful mandate of bringing greater harmony (sorry not sorry) to post-war Europe. Competitors—singers elected to represent a country with a single, three-minute song each—and voters come from the member countries of the European Broadcasting Union. The EBU is not geographically restricted to Europe. Currently, some fifty countries send contestants, including states such as Israel (last year’s winner), Azerbaijan, and Australia. Many of the rules that govern Eurovision have changed in its 62-year history, including restrictions governing which language singers may use. Today, it is common to hear a majority of songs with at least some text sung in English, including verses of “1944.” Some rules, though, have been immutable, including the following: songs must have words (although the words need not be sensical). All vocal sounds must be performed live, including background vocals. Voters, be they professional juries or the public—who can vote today by telephone, SMS, or app—cannot vote for their own nation’s competitor (though unproven conspiracy theories about fans crossing national borders in order to vote in defiance of this rule have, at times, flourished.) Finally, reaching back to its founding mandate defining Eurovision as a “non-political event,” songs are not permitted to contain political (or commercial) messages.
Both the title and lyrics of Jamala’s “1944” refer to the year that Crimean Tatars were brutally deported from Crimea under Stalinist edict. Indicted wholesale as “enemies of the Soviet people,” the NKVD rounded up the entire population of Crimean Tatars—estimated to be some 200,000 people—packed them into cattle cars, and transported them thousands of miles away, mostly to Uzbekistan and other regions of Central Asia. The Soviet regime cast this as a “humanitarian resettlement” intended to bring Crimean Tatars closer to other Muslim, Turkic-language populations. However, Crimean Tatars, who estimate that up to two-thirds of their population perished before arriving in Central Asia, consider this a genocidal act. They were not given the right to return to Crimea until the late 1980s. So, through clear reference to a twentieth-century political trauma with consequences that stretch into the present, “1944” was not the feel-good fluff of classic Eurovision.
Jamala’s performance of “1944” at Eurovision was also atypical in that it largely eschewed pizzazz and bombast. Little skin was shown, there were no open flames, no smoke machines befogged the scene. Instead, Jamala stood, mostly still and center stage, encircled by spotlight. Large projections of flowers framed the stage for the first two minutes of the song, as she sang verses (in English) and a chorus in (Crimean Tartar) that utilized lyrics from a well-known twentieth-century Crimean Tatar protest song called Ey, Güzel Qirim (Oh, My Beautiful Crimea). The groove of the song is spare and rather slow, and the singer’s voice meanders within a fairly narrow range on both verse and chorus.
But then comes the vocalise on the bridge: two minutes and fifteen seconds into the Eurovision performance, the song’s chilled-out but propulsive motion stops, leaving only a faint synthesizer drone. In the sudden quiet, Jamala mimes the act of rocking an infant. Beginning in the middle of her range, she elaborates a melismatic wail that recalls the snaking modal melody of the traditional Crimean Tatar song Arafat Daği. The bridge consists of two phrases interrupted by a forceful and nervous inhalation of breath. Her breath is loud and intentional, calling attention to the complex ornaments that she has already executed, and preparing us for more ornaments to come.
Over the course of eight seconds, Jamala’s voice soars upwards, increasing steadily in volume and intensifying timbrally from a more relaxed vocal sound to an anguished belt. At the apex of the bridge, the Eurovision camera soars above the stage just as the singer looks into the camera’s eye. Meanwhile, the screens framing the stage explode into visuals that suggest a phoenix rising from the ash. The crowd erupts into applause.
Other renditions of “1944” deliver a similar emotional payoff at the climax of the bridge. In the dystopian narrative of Jamala’s official music video, a tornado whips free, setting a field of immobilized human figures into chaotic motion (minute 2:35). In a reality TV song contest called Holos Kraïny (the Ukrainian Voice), a young singer’s powerful elaboration of the bridge propels a coach out of her seat as she wipes tears from her eyes (minute 3:42). In other covers, the bridge is too difficult to attempt: one British busker leaves the “amazing vocal bit in the middle” to “the good people of Ukraine to sing along.”
Timbrally and gesturally, I also hear the resonance between the plangent sound of the duduk—a double-reed wind instrument associated most closely with Armenia, and often called upon to perform in commemorations of the 1915 Armenian genocide—and Jamala’s voice on the vocalise. According to Jamala (who generously responded to my questions via email through her PR person), this was not intentional. But the prominence of the instrument in the arrangement, the lightly nasal quality that her voice adopts in the bridge, and the glottalized movements she uses between pitches suggest that this connection might have been audible to listeners. After all, the opening melodic gesture of “1944” is sounded by a duduk, and it re-enters spectacularly just after the peak of the bridge, where it doubles Jamala’s vocal line as it cascades downwards from the high note. Through sonic entanglement with the duduk, Jamala here communicates anguish on another register, without translation into words.
The performance of sonic anguish through the voice might be understood, in Greg Urban’s terms, as a “meta-affect.” Jamala delivers the emotion of anguish but also fosters sociality by interpellating listeners into the shared emotional state of communal grieving. I paraphrase from Urban’s well-known analysis of “ritual wailing” to argue that Jamala, through this performance of vocal anguish, makes both intelligible and acceptable the public sentiment of grief. This utterance of grief is a statement of “separation and loss that is canonically associated with death” (392) that included the Eurovision audience as co-participants in the experience of grieving, of experiencing anguish over loss. A popular fan reaction video by “Jake’s Face Reacts,” posted to YouTube, and the hundreds of comments responding to it, attest to this experience of co-participation in the experience of grief. Furthermore, the power of this meta-affect is almost certainly heightened through normative gendered associations with performative anguish. Lauren Ninoshvili (2012) identifies this in the “expressive labor” of mourning mothers’ wailing in the Republic of Georgia, while Farzaneh Hemmasi (2017) has recently elucidated how the voice of the exiled Iranian diva Googoosh became iconic of the suffering, feminized, victimized nation of Iran.
The sociologist of music Simon Frith once wrote that “in songs, words are the sign of the voice” (97). To put it in slightly banal terms, songs, as we generally define them, include words uttered by human voices. (Or if they don’t have words uttered by voices, this becomes the notable feature of the song, c.f. Mendelssohn Songs Without Words, Pete Drake’s talking guitar, Georgian vocable polyphony). But non-lexical vocalities also function as a sign of the voice, and, as scholars such as Ana Maria Ochoa (2014) and Jennifer Stoever (2016) have argued, expand our capacity to recover more complex personhoods from the subjugated vocalities of the past. In fact, often the most communicative, feelingful parts of songs occur during un-texted vocalizations. As generations of scholars have argued, timbre means a lot—Nina Eidsheim’s The Race of Sound: Listening, Timbre, and Vocality in African American Music (Duke University Press: 2019) presents a very recent example—and it is often overlooked when we take the key attributes of Western Art Music as our sole formal parameters for analysis: melody, rhythm, harmony, form. So as we watch the parade of aspiring Eurovision champions duke it out in the pop pageant of geopolitics, let’s attune ourselves to the vocal colors, the timbral gestures, the ululations and the growls, to the panoply of visual and auditory stimuli demanding our attention and, more important (depending on where we live), our vote.
Featured Image: “Jamala” by Flickr User Andrei Maximov, CC BY-NC-ND 2.0
Maria Sonevytsky is Assistant Professor of Ethnomusicology at the University of California, Berkeley. Her first book, Wild Music: Sound and Sovereignty in Ukraine, will be out in October 2019 with Wesleyan University Press.
REWIND! . . .If you liked this post, you may also dig: