Beyond the Every Day: Vocal Potential in AI Mediated Communication
In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series for Sounding Out! (along with ed-in-chief JS!). It starts today, with Amina Abbas-Nazari, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice– are we training it, or is it actually training us?
Hi, good morning. I’m calling in from Bangalore, India.” I’m talking on speakerphone to a man with an obvious Indian accent. He pauses. “Now I have enabled the accent translation,” he says. It’s the same person, but he sounds completely different: loud and slightly nasal, impossible to distinguish from the accents of my friends in Brooklyn.The AI startup erasing call center worker accents: is it fighting bias – or perpetuating it? (Wilfred Chan, 24 August 2022)
This telephone interaction was recounted in The Guardian reporting on a Silicon Valley tech start-up called Sanas. The company provides AI enabled technology for real-time voice modification for call centre workers voices to sound more “Western”. The company describes this venture as a solution to improve communication between typically American callers and call centre workers, who might be based in countries such as Philippines and India. Meanwhile, research has found that major companies’ AI interactive speech systems exhibit considerable racial imbalance when trying to recognise Black voices compared to white speakers. As a result, in the hopes of being better heard and understood, Google smart speaker users with regional or ethnic American accents relay that they find themselves contorting their mouths to imitate Midwestern American accents.
These instances describe racial biases present in voice interactions with AI enabled and mediated communication systems, whereby sounding ‘Western’ entitles one to more efficient communication, better usability, or increased access to services. This is not a problem specific to AI though. Linguistics researcher John Baugh, writing in 2002, describes how linguistic profiling is known to have resulted in housing being denied to people of colour in the US via telephone interactions. Jennifer Stoever‘s The Sonic Color Line (2016) presents a cultural and political history of the racialized body and how it both informed and was informed by emergent sound technologies. AI mediated communication repeats and reinforces biases that pre-exist the technology itself, but also helping it become even more widely pervasive.
Mozilla’s commendable Common Voice project aims to ‘teach machines how real people speak’ by building an open source, multi-language dataset of voices to improve usability for non-Western speaking or sounding voices. But singer and musicologist, Nina Sun Eidsheim describes how ’a specific voice’s sonic potentiality [in] its execution can exceed imagination’ (7), and voices as having ‘an infinity of unrealised manifestations’ (8) in The Race of Sound (2019). Eidsheim’s sentiments describe a vocal potential, through musicality, that exists beyond ideas of accents and dialects, and vocal markers of categorised identity. As a practicing vocal performer, I recognise and resonate with Eidsheim’s ideas I have a particular interest in extended and experimental vocality, especially gained through my time singing with Musarc Choir and working with artist Fani Parali. In these instances, I have experienced the pleasurable challenge of being asked to vocalise the mythical, animal, imagined, alien and otherworldly edges of the sonic sphere, to explore complex relations between bodies, ecologies, space and time, illuminated through vocal expression.
Following from Eidsheim, and through my own vocal practice, I believe AI’s prerequisite of voices as “fixed, extractable, and measurable ‘sound object[s]’ located within the body” is over-simplistic and reductive. Voices, within systems of AI, are made to seem only as computable delineations of person, personality and identity, constrained to standardised stereotypes. By highlighting vocal potential, I offer a unique critique of the way voices are currently comprehended in AI recognition systems. When we appreciate the voice beyond the homogenous, we give it authority and autonomy, ultimately leading to a fuller understanding of the voice and its sounding capabilities.
My current PhD research, Speculative Voicing, applies thinking about the voice from a musical perspective to the sound and sounding of voices in artificially intelligent conversational systems. Herby the voice becomes an instrument of the body to explore its sonic materiality, vocal potential and extremities of expression, rather than being comprehended in conjunction to vocal markers of identity aligning to categories of race, gender, age, etc. In turn, this opens space for the voice to be understood as a shapeshifting, morphing and malleable entity, with immense sounding potential beyond what might be considered ordinary or everyday speech. Over the long term this provides discussion of how experimenting with vocal potential may illuminate more diverse perspectives about our sense of self and being in relation to vocal sounding.
Vocal and movement artist Elaine Mitchener exhibits the disillusion of the voice as ‘fixed’ perfectly in her performance of Christian Marclay’s No!, which I attended one hot summer’s evening at the London Contemporary Music Festival in 2022. Marclay’s graphic score uses cut outs from comic book strips to direct the performer to vocalise a myriad of ‘No”s.
Mitchener’s rendering of the piece involved the cooperation and coordination of her entire body, carefully crafting lips, teeth, tongue, muscles and ligaments to construct each iteration of ‘No.’ Each transmutation of Mitchener’s ‘No’s’ came with a distinct meaning, context, and significance, contained within the vocalisation of this one simple syllable. Every utterance explored a new vocal potential, enabled by her body alone. In the context of AI mediated communication, we can see this way of working with the voice renders the idea of the voice as ‘fixed’ as redundant. Mitchener’s vocal potential demonstrates that voices can and do exist beyond AI’s prescribed comprehension of vocal sounding.
In order to further understand how AI transcribes understandings of voice onto notions of identity, and vocal potential, I produced the practice project Polyphonic Embodiment(s) as part of my PhD research, in collaboration with Nestor Pestana, with AI development by Sitraka Rakotoniaina. The AI we created for this project is based upon a speech-to-face recognition AI that aims to be able to tell what your face looks like from the sound of your voice. The prospective impact of this AI is deeply unsettling, as its intended applications are wide-ranging – from entertainment to security, and as previously described AI recognition systems are inherently biased.
This multi-modal form of comprehending voice is also a hot topic of research being conducted by major research institutions including Oxford University and Massachusetts Institute of Technology. We wanted to explore this AI recognition programme in conjunction with an understanding of vocal potential and the voice as a sonic material shaped by the body. As the project title suggests, the work invites people to consider the multi-dimensional nature of voice and vocal identity from an embodied standpoint. Additionally, it calls for contemplation of the relationships between voice and identity, and individuals having multiple or evolving versions of identity. The collaboration with the custom-made AI software creates a feedback loop to reflect on how peoples’ vocal sounding is “seen” by AI, to contest the way voices are currently heard, comprehended and utilised by AI, and indeed the AI industry.
The video documentation for this project shows ‘facial’ images produced by the voice-to-face recognition AI, when activated by my voice, modified with simple DIY voice devices. Each new voice variation, created by each device, produces a different outputted face image. Some images perhaps resemble my face? (e.g. Device #8) some might be considered more masculine? (e.g. Device #10) and some are just disconcerting (e.g. Device #4). The speculative nature of Polyphonic Embodiment(s) is not to suggest that people should modify their voices in interaction with AI communication systems. Rather the simple devices work with bodily architecture and exaggerate its materiality, considering it as a flexible instrument to explore vocal potential. In turn this sheds light on the normative assumptions contained within AI’s readings of voice and its relationships to facial image and identity construction.
Through this artistic, practice-led research I hope to evolve and augment discussion around how the sounding of voices is comprehended by different disciplines of research. Taking a standpoint from music and design practice, I believe this can contest ways of working in the realms of AI mediated communication and shape the ways we understand notions of (vocal) identity: as complex, fluid, malleable, and ultimately not reducible to Western logics of sounding.
Featured Image: Still image from Polyphonic Embodiments, courtesy of author.
Amina Abbas-Nazari is a practicing speculative designer, researcher, and vocal performer. Amina has researched the voice in conjunction with emerging technology, through practice, since 2008 and is now completing a PhD in the School of Communication at the Royal College of Art, focusing on the sound and sounding of voices in artificially intelligent conversational systems. She has presented her work at the London Design Festival, Design Museum, Barbican Centre, V&A, Milan Furniture Fair, Venice Architecture Biennial, Critical Media Lab, Switzerland, Litost Gallery, Prague and Harvard University, America. She has performed internationally with choirs and regularly collaborates with artists as an experimental vocalist
REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
Voice as Ecology: Voice Donation, Materiality, Identity–-Steph Ceraso
Mr. and Mrs. Talking Machine: The Euphonia, the Phonograph, and the Gendering of Nineteenth Century Mechanical Speech – J. Martin Vest
One Scream is All it Takes: Voice Activated Personal Safety, Audio Surveillance, and Gender Violence—María Edurne Zuazu
Echo and the Chorus of Female Machines—AO Roberts
On Sound and Pleasure: Meditations on the Human Voice– Yvon Bonefant
SO! Amplifies: Immigrants Wake America Podcast and the Work of Engaged Digital Humanities
SO! Amplifies. . .a highly-curated, rolling mini-post series by which we editors hip you to cultural makers and organizations doing work we really really dig. You’re welcome!
Conceptualized at a time of rampant increase in anti-immigrant violence, Immigrants Wake America is a creative response to the growing bias and violence against immigrant women in the U.S., as seen in the Atlanta shootings, the rise in hate crimes since the onset of Covid-19, and the US-Mexico border crisis. We believe that storytelling allows us to find similarities and differences between ourselves and others, offering a humanizing counterpart to harmful media narratives. The podcast creates a living archive of stories not yet heard, serving as an audio intervention into how immigrant women’s (hi)stories are narrated and passed on.
Immigrants Wake America is a public humanities, community-engaged project of digital storytelling through podcasts, in partnership with the Tenement Museum in New York. It features storytellers who share their family stories about migration and the centrality of women in their life histories. These storytellers have submitted stories to the Tenement Museum’s digital archive Your Story, Our Story (YSOS),
Founded in 1988, the Tenement Museum, focuses on immigration and immigrants to “foster a society that embraces and values the role of immigration in the evolving American identity.” YSOS cofounded by Annie Polland and Kathryn Lloyd, is a digital archive that houses stories associated with immigration, migration, and cultural identity. Some of the storytellers are first generation immigrants, while others are descendants of immigrants, born and raised in the US; their great-grandparents or grandparents migrated to the US ages ago. Through YSOS, the Tenement Museum invites people across the country to share their stories in the online digital storytelling exhibit. Each story reveals one individual’s experience. Together, the stories help us see how the unique histories shape the nation, and the patterns that bind us together.
Through exploring and curating stories from Your Story Our Story, we facilitate conversations that supplement and expand it. This makes possible the conception of an archive that is both dynamic and collaborative. Such an archive resists the colonization and appropriation of lives and narratives of our storytellers. We navigate through the ethical conundrums that one might structurally and personally face in this collaborative endeavor. In our engagement with the archives at the Tenement Museum, we believe that our podcasting project really opens up the possibilities for an expansion of the archive.
We released our first episode, the Introductory Episode on January 15th, 2022, and have since been consistently releasing one episode per month.
While our podcast does not claim to retrieve or lay out these microhistories in their entirety, at an early stage of its development, we came to realize the potential that the form of the podcast itself offers for a different kind of storytelling. In our podcast, we treat stories as primary documents instead of marginalia. Michelle Caswell (2014) uses the term “symbolic annihilation” to describe the absence or misrepresentation of marginalized communities in archives. She advocates the powerful forces of community archives in countering “symbolic annihilation.” In thinking about archives in The Archaeology of Knowledge, Michel Foucault is concerned with “the density of discursive practices” wherein he observes “systems that establish statements as events and things (145)” This system of statements (as events or things) is what contributes to the law of what can be said. Processes of digital communal archiving such as those done by South Asian American Digital Archive (SAADA) or the Tenement Museum attempt to extend or expand the systematic possibility of events and things. Caswell and her colleagues have demonstrated the importance and success of the SAADA project. They have also pointed to the impossibility of representation in a traditional archive which is built on violence committed on colonized and enslaved bodies, also eloquently pointed out by Saidiya Hartman’s scholarship.
Through our experience we’ve learnt that podcasts can serve as a transgressive-dynamic expansion of digital archiving, given their unique ability to cut across racial and gendered lines of preconceived sonic notions and their potential to expand the current techniques and media of digital archiving. We map this formal potential of the podcast in the way it intersects with digital archiving in the following ways:
First, narratorial voice.
We wanted our project to act as an intervention in the way in which immigrant women’s (hi)stories are consumed and passed on. We wanted to provide counter narratives. It was essential that the storytellers share their stories in their own voices, literally! The audio medium allows us to produce a space for listening to voices that are otherwise marginalized and/or demonized.–Le Li and Shruti Jain
Among the several unique and inspiring stories of resilience that the Tenement Museum houses, one such is a story by an immigrant case manager at the American Civic Association in Binghamton, Goretti Mugambwa. The museum and our podcast make it possible for her story to be narrated by herself in her voice. With her experience of working with the refugee and immigrant community she also does not just remain an individual voice, but acts to further a collective assertion.
Next, sonic variations.
Our storytellers’ voices are not just “characteristics” of the story but are an essential part of the story itself. We believe that each immigrant and their descendent brings to the story their unique tonal texture. This diversity destabilizes what immigrants and their descendants are expected to sound like. The sounds we add in the editing process are minimal. We try not to impose emotional cues and responses upon our listeners.–Shruti Jain and Le Li
The multiplicity of voices in our podcast–and therefore in the archive–are not just “characteristics” of the aural storytelling or listening process, but are as much an essential part of the story itself. In line with what The Sonic Color Line reminds us, our work also finds that, “sound frequently appears to be visuality’s doppelgänger in U.S. racial history” (Stoever 4). This leads to the coding of race as not just visual but aural too. We want to clarify that the white constructed ideas of how people of color must sound flatten out the complexities in how people within and across communities do sound. At the same time, these notions of white sonic normativity also create a strong sense of what one must or must not sound like in order to succeed in the racial capitalist world order. The storytellers of our podcast and we ourselves are of diverse backgrounds. This, for us, is a way to demonstrate the “complex range of sounds actually produced by people of color” (Stoever 43). As Nancy Morales argues in “Óyeme Voz: U.S. Latin@ & Immigrant Communities Re-Sound Citizenship and Belonging,” the sound of ‘everyday voices’ mobilized against—and remarking on—the nation-state’s attempts to mark immigrant communities as vulnerable exerts an impactful and profoundly material agency.” With its conversational and collaborative format, our podcast serves as a dynamic medium to represent (his)stories that complicate generic conventions in critical ways.
We have also been personally deeply impacted by the process of working on this podcast. We have made lasting bonds with our colleagues and storytellers alike. The storytellers of our podcast act not just as guests, but as collaborators and stakeholders. Instead of interpreting the stories in our own way and retelling the stories, we collaborate with the storytellers, and facilitate the unfolding of hidden stories by the storytellers. Dr. Lisa Yun, Professor of English at Binghamton University, and Kathryn Lloyd, Senior Director of Programs, Tenement Museum, have been advisors and the executive producers of the podcast. Together with Lloyd and Yun, we built a project on the ethos of collaboration.
The editing process of IWA too, is different. Rather than making individual editorial decisions, we engage the storytellers directly in post-production. After finishing a first edit of an episode collaboratively between ourselves, we then send it to the storytellers for their feedback and approval before releasing it. Sometimes, the storytellers do suggest changes. Based on their feedback, we re-edit the episode and eventually release it after the storytellers approval. We have also innovated methods of community editing, where we edit in groups of as large as 15 people.
The podcast medium makes Immigrants Wake America an ideal project for the public humanities. As opposed to lengthier podcasts, each episode of our podcast is edited down to 15-20 minutes. These can be used by educators as an in-class resource to generate discussion and activities. Community listeners could tune in during lunch breaks, get-togethers, cooking, driving or doing chores. Our episodes can also serve as conversation starters and help facilitate affective bonds among immigrants and non-immigrants alike.
The final episode of our first season, “Finding Our Grandmother in the Records,” aired just last week, and a second season is in the works.
As a way to expand this project, our second season will feature storytellers from our local community in addition to Your Story, Our Story. We plan to have units within our project dedicated to translation, recording and editing, and creating teaching resources. We aim for meaningful and engaged conversations and try to blur the supposed boundaries between the university and the community. Join us!
The first season of Immigrants Wake America was sponsored through the Institute for Advanced Studies in the Humanities at Binghamton University and a Public Humanities Grant from Humanities New York. Dr. Lisa Yun, Professor of English at Binghamton University, and Kathryn Lloyd, Senior Director of Programs, Tenement Museum, have been our advisors and the executive producers of the podcast. IWA is available on major streaming platforms such as Spotify, Google Podcasts, Apple Podcasts, Amazon Music, Soundcloud, and Audible.
Le Li and Shruti Jain are pursuing their PhDs at Binghamton University in the Translation Research and Instruction Program and the English Department respectively. They were Humanities New York Public Humanities fellows (2021-22) and graduate fellows of the Institute for Advanced Studies in the Humanities (IASH) at Binghamton University (2021-22). Through their podcast project and their work with digital community archives, Le and Shruti are currently working on exploring intersection between podcasts and digital archiving. They try to capitalize on the unique ability that the form of the podcast offers to cut across racial and gendered lines of preconceived sonic notions, which makes possible the conception of an archive that can be both dynamic and collaborative. Le’s research interests include translation studies, cultural studies, diaspora studies, and public humanities. Shruti’s PhD focuses on the Enlightenment, British Empire and the relationalities between race and caste formations.
REWIND!…If you liked this post, you may also dig all this good stuff about sound studies pedagogy! Good luck with Fall semester, folks!:
“Heavy Airplay, All Day with No Chorus”: Classroom Sonic Consciousness in the Playlist Project—Todd Craig
SO! Podcast #79: Behind the Podcast: deconstructing scenes from AFRI0550, African American Health Activism – Nic John Ramos and Laura Garbes
The Sounds of Anti-Anti-Essentialism: Listening to Black Consciousness in the Classroom- Carter Mathes
Making His Story Their Story: Teaching Hamilton at a Minority-serving Institution–Erika Gisela Abad
Teaching Soundwalks in a Course on Gentrification, Black Music, and Corporate America–Rami Toubia Stucky
Deejaying her Listening: Learning through Life Stories of Human Rights Violations– Emmanuelle Sonntag and Bronwen Low
Audio Culture Studies: Scaffolding a Sequence of Assignments– Jentery Sayers
Deep Listening as Philogynoir: Playlists, Black Girl Idiom, and Love–Shakira Holt
“Toward A Civically Engaged Sound Studies, or ReSounding Binghamton”–Jennifer Lynn Stoever
Listening to #Occupy in the Classroom–D. Travers Scott
SO! Podcast #71: Everyday Sounds of Resilience and Being: Black Joy at School–Walter Gershon
Sounding Out! Podcast #13: Sounding Shakespeare in S(e)oul– Brooke Carlson
A Listening Mind: Sound Learning in a Literature Classroom–Nicole Brittingham Furlonge
My Voice, or On Not Staying Quiet–Kaitlyn Liu
(Re)Locating Soundscapes of Schooling: Learning to Listen to Children’s Lifeworlds–Cassie J. Brownell
If You Can Hear My Voice: A Beginner’s Guide to Teaching–Caroline Pinkston
Mukbang Cooks, Chews, and Heals – David Lee
SO! Podcast #80: Refugee Realities Miniseries–Steph Ceraso