The Cyborg’s Prosody, or Speech AI and the Displacement of Feeling

In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series (along with ed-in-chief JS!). It kicked off with Amina Abbas-Nazari’s post, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice. Then, Golden Owens took a deep historical dive into the racialized sound of servitude in America and how this impacts Intelligent Virtual Assistants. Last week, Michelle Pfeifer explored how some nations are attempting to draw sonic borders, despite the fact that voices are not passports. Today, Dorothy R. Santos wraps up the series with a meditation on what we lose due to the intensified surveilling, tracking, and modulation of our voices. [To read the full series, click here] –JS
—

In 2010, science fiction writer Charles Yu wrote a story titled “Standard Loneliness Package,” where emotions are outsourced to another human being. While Yu’s story is a literal depiction, albeit fictitious, of what might be entailed and the considerations that need to be made of emotional labor, it was published a year prior to Apple introducing Siri as its official voice assistant for the iPhone. Humans are not meant to be viewed as a type of technology, yet capitalist and neoliberal logics continue to turn to technology as a solution to erase or filter what is least desirable even if that means the literal modification of voice, accent, and language. What do these actions do to the body at risk of severe fragmentation and compartmentalization?
I weep.
I wail.
I gnash my teeth.
Underneath it all, I am smiling. I am giggling.
I am at a funeral. My client’s heart aches, and inside of it is my heart, not aching, the opposite of aching—doing that, whatever it is.
Charles Yu, “Standard Loneliness Package,” Lightspeed: Science Fiction & Fantasy, November 2010
Yu sets the scene by providing specific examples of feelings of pain and loss that might be handed off to an agent who absorbs the feelings. He shows us, in one way, what a world might look and feel like if we were to go to the extreme of eradicating and off loading our most vulnerable moments to an agent or technician meant to take on this labor. Although written well over a decade ago, its prescient take on the future of feelings wasn’t too far off from where we find ourselves in 2023. How does the voice play into these connections between Yu’s story and what we’re facing in the technological age of voice recognition, speech synthesis, and assistive technologies? How might we re-imagine having the choice to displace our burdens onto another being or entity? Taking a cue from Yu’s story, technologies are being created that pull at the heartstrings of our memories and nostalgia. Yet what happens when we are thrust into a perpetual state of grieving and loss?
Humans are made to forget. Unlike a computer, we are fed information required for our survival. When it comes to language and expression, it is often a stochastic process of figuring out for whom we speak and who is on the receiving end of our communication and speech. Artist and scholar Fabiola Hanna believes polyvocality necessitates an active and engaged listener, which then produces our memories. Machines have become the listeners to our sonic landscapes as well as capturers, surveyors, and documents of our utterances.

The past few years may have been a remarkable advancement in voice tech with companies such as Amazon and Sanas AI, a voice recognition platform that allows a user to apply a vocal filter onto any human voice, with a discernible accent, that transforms the speech into Standard American English. Yet their hopes for accent elimination and voice mimicry foreshadow a future of design without justice and software development sans cultural and societal considerations, something I work through in my artwork in progress, The Cyborg’s Prosody (2022-present).
The Cyborg’s Prosody is an interactive web-based artwork (optimized for mobile) that requires participants to read five vignettes that increasingly incorporate Tagalog words and phrases that must be repeated by the player. The work serves as a type of parody, as an “accent induction school” — providing a decolonial method of exploring how language and accents are learned and preserved. The work is a response to the creation of accent reduction schools and coaches in the Philippines. Originally, the work was meant to be a satire and parody of these types of services, but shifted into a docu-poetic work of my mother’s immigration story and learning and becoming fluent in American English.

Even though English is a compulsory language in the Philippines, it is a language learned within the parameters of an educational institution and not common speech outside of schools and businesses. From the call center agents hired at Vox Elite, a BPO company based in the Philippines, to a Filipino immigrant navigating her way through a new environment, the embodiment of language became apparent throughout the stages of research and the creative interventions of the past few years.
In Fall 2022, I gave an artist talk about The Cyborg’s Prosody to a room of predominantly older, white, cisgender male engineers and computer scientists. Apparently, my work caused a stir in one of the conversations between a small group of attendees. A couple of the engineers chose to not address me directly, but I overheard a debate between guests with one of the engineers asking, “What is her project supposed to teach me about prosody? What does mimicking her mom teach me?” He became offended by the prospect of a work that de-centered his language, accent, and what was most familiar to him.The Cyborg’s Prosody is a reversal of what is perceived as a foreign accented voice in the United States into a performance for both the cyborg and the player. I introduce the term western vocal drag to convey the caricature of gender through drag performance, which is apropos and akin to the vocal affect many non-western speakers effectuate in their speech.
The concept of western vocal drag became a way for me to understand and contemplate the ways that language becomes performative through its embodiment. Whether it is learning American vernacular to the complex tenses that give meaning to speech acts, there is always a failure or queering of language when a particular affect and accent is emphasized in one’s speech. The delivery of speech acts is contingent upon setting, cultural context, and whether or not there is a type of transaction occurring between the speaker and listener. In terms of enhancement of speech and accent to conform to a dominant language in the workplace and in relation to global linguistic capitalism, scholar Vijay A. Ramjattan states in that there is no such thing as accent elimination or even reduction. Rather, an accent is modified. The stakes are high when taking into consideration the marketing and branding of software such as Sanas AI that proposes an erasure of non-dominant foreign accented voices.
The biggest fear related to the use of artificial intelligence within voice recognition and speech technologies is the return to a Standard American English (and accent) preferred by a general public that ceases to address, acknowledge, and care about linguistic diversity and inclusion. The technology itself has been marketed as a way for corporations and the BPO companies they hire to mind the mental health of the call center agents subjected to racism and xenophobia just by the mere sound of their voice and accent. The challenge, moving forward, is reversing the need to serve the western world.
A transorality or vocality presents itself when thinking about scholar April Baker-Bell’s work Black Linguistic Consciousness. When Black youth are taught and required to speak with what is considered Standard American English, this presents a type of disciplining that perpetuates raciolinguistic ideologies of what is acceptable speech. Baker-Bell focuses on an antiracist linguistic pedagogy where Black youth are encouraged to express themselves as a shift towards understanding linguistic bias. Deeply inspired by her scholarship, I started to wonder about the process for working on how to begin framing language learning in terms of a multi-consciousness that includes cultural context and affect as a way to bridge gaps in understanding.

Or, let’s re-think this concept or idea that a bad version of English exists. As Cathy Park Hong brilliantly states, “Bad English is my heritage…To other English is to make audible the imperial power sewn into the language, to slit English open so its dark histories slide out.” It is necessary for us all to reconfigure our perceptions of how we listen and communicate that perpetuates seeking familiarity and agreement, but encourages respecting and honoring our differences.
—
Featured Image: Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos
—
Dorothy R. Santos, Ph.D. (she/they) is a Filipino American storyteller, poet, artist, and scholar whose academic and research interests include feminist media histories, critical medical anthropology, computational media, technology, race, and ethics. She has her Ph.D. in Film and Digital Media with a designated emphasis in Computational Media from the University of California, Santa Cruz and was a Eugene V. Cota-Robles fellow. She received her Master’s degree in Visual and Critical Studies at the California College of the Arts and holds Bachelor’s degrees in Philosophy and Psychology from the University of San Francisco. Her work has been exhibited at Ars Electronica, Rewire Festival, Fort Mason Center for Arts & Culture, Yerba Buena Center for the Arts, and the GLBT Historical Society.
Her writing appears in art21, Art in America, Ars Technica, Hyperallergic, Rhizome, Slate, and Vice Motherboard. Her essay “Materiality to Machines: Manufacturing the Organic and Hypotheses for Future Imaginings,” was published in The Routledge Companion to Biology in Art and Architecture. She is a co-founder of REFRESH, a politically-engaged art and curatorial collective and serves as a member of the Board of Directors for the Processing Foundation. In 2022, she received the Mozilla Creative Media Award for her interactive, docu-poetics work The Cyborg’s Prosody (2022). She serves as an advisory board member for POWRPLNT, slash arts, and House of Alegria.
—

REWIND! . . .If you liked this post, you may also dig:
Your Voice is (Not) Your Passport—Michelle Pfeifer
“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens
Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari
Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso
The Sound of What Becomes Possible: Language Politics and Jesse Chun’s 술래 SULLAE (2020)—Casey Mecija
Look Who’s Talking, Y’all: Dr. Phil, Vocal Accent and the Politics of Sounding White–Christie Zwahlen
Listening to Modern Family’s Accent–Inés Casillas and Sebastian Ferrada
Tuning In to the Desi Valley: Getting to Know a Community via Radio

Sound has a peculiar relationship to mindfulness; zoning in and out, active and passive forms of listening while we situate our listening practices alongside other daily activities. Especially when it comes to driving, listening to something or someone or just singing aloud by myself, I have realized, helps me drown out other noises of alertness. Over the years I have come to value background music or chatter and especially radio programming that takes the burden of curation and scheduling off my back, in all sorts of tasks that require deep concentration. Enough and more has been said about the visual-bias in various forms of ethnographic inquiry (see Andrew C. Sparkes’s “Ethnography and the senses” for a good example). Without belaboring these arguments, I also find that knowing through listening and listening as a mode of non-haptic yet immersive and intimate engagement can also prove to be a fruitful method of inquiry, especially in our post-pandemic worlds, where it feels a lot harder to establish intimacy. The United Nations noted that radio, in particular, “provided solace” during that period of physical distancing and social isolation.
For me, radio sparked my accidental realization and foregrounding of sonic methods as an itinerant means of getting to know new things, people and surroundings in life and research when I moved from New York to the San Francisco Bay Area in mid-2022 to start a new position as a postdoctoral researcher. Knowing that I would continue living in California for the near future, after eight long years of having deferred driving in America, I decided to learn driving and buy a car. I was also especially excited to be moving to Sunnyvale, a city in the Southern Peninsula, located between more known places like Palo Alto and San Jose. Sunnyvale is often jokingly called the desi capital, perhaps the most Indian of any ‘Little India’ you could find in America. As shorthand, desi, a Hindi word,refers to anyone and everything with ties to the South Asian subcontinent. In more recent years, the term has gained currency especially among South Asian diasporic communities to self-refer to culture, music, food, often to signify the presence and strength of transnational ties (between India and their countries of settlement).
What I hadn’t anticipated was that driving brought about a new connection to the medium of radio, as I started tuning in to take my mind off the jitters that come with the new sounds of an automobile-dependent region: fast moving rubber hitting the freeway tarmac and the lane changes and the scalar adjustments that demand driving tens of miles to ensure you still have a social life stitched together across the vastness of California. I began to find the act of tuning in and out of stations fascinating, especially how the radio as a device holds the parallel realities of so many people with different interests, languages and politics together but separate. Scholarship and even casual listening shows that non-English radio stations catering to various immigrant and other communities have existed for a long time in the US and elsewhere (I wish pondering the sonic geographies of Radio Garden, a web-based map interface that allows listeners to access any free-to-air radio stations across the world, wasn’t beyond the scope of this post!). Both as an ethnographer and as an insider-outsider within the larger Indian immigrant and diaspora community (but a newcomer to the Bay Area), tuning into the local desi radio station while driving offered me a good way to enter the desi community of the Bay Area, to know what it means to be desi and perform Indianness in 2022, in a place where I regularly see so many Indians and South Asians every single day.
As a friend who recently visited me in Sunnyvale remarked as we waited for our table at the always-busy Madras Cafe, Sunnyvale feels so much like India! And I felt it too; what she was referring to was not only the very visible presence of Indian people all around town or the abundance of restaurants catering to various sub-regional cuisines from India. What felt different to me here was how relaxed everyone looked in grocery stores, restaurants and elsewhere, how utterly remade Sunnyvale is as a pan-Asian but mostly Indian space that even the smallest performances of fitting in feel unnecessary. In fact, shopping her reminded me of a point Purnima Mankekar’s Indian narrators made in her iconic essay on Indian grocery shopping in the San Francisco Bay Area (2010), that Indian stores in Sunnyvale and Milpitas as places where “white people look out of place” as compared to the ones in Berkeley.
The juxtaposition of the non-performances and the weight of being comfortably in place in a community like Sunnyvale only settles on the mind and body very slowly, like a faint but familiar smell from home. Words and demands often stumble out of my mouth at the grocery store, as if I am allowed and I will be completely understood. Immigrant life in America is steeped in language acrobatics, balancing being understood with becoming deliberately opaquely incomprehensible, using one accent for the ones from home, one for those who make you feel at home, and then the American accent for the Americans outside. The contrast of places like Sunnyvale, Fremont, Milpitas and other similar Northern California cities that have been transformed by immigrant presence is not just one of tangible and observable things, but sonic markers too, like Bolly 92.3FM.

Bolly 92.3FM is the default desiradio station that services the entire San Francisco Bay Area. As the name suggests, for the most part the station plays popular Bollywood songs from recently released movies and albums, but just as other stations do in India, Bolly92.3FM leverages its listener demographics at different times of the day to also play classics and hits from older Hindi films during late night slots. Interestingly, as the presenters repeat the station name and jingle time and again, they also remind you that you are listening to Bay Area’s Bollywood station owned by the Silicon Valley Asian media network. The name Bolly92.3FM is a play on the broad familiarity with Bollywood in the US even as Indian audiences globally have moved away from the older connotations of Bollywood as North Indian cinema with song, dance and people dressed in flashy clothes. Much like the hyper-authentic Indian restaurants that serve regional cuisines such as Andhra, Tamil, Gujarati, Rajasthani and Marathi food to their loyal and affording immigrant patrons, Bolly92.3FM has also configured its programming to cater to different regional and linguistic communities from India in the Bay Area. For instance, Saturday morning and afternoon slots are dedicated to Telugu programs—everything is in Telugu (language) from the hosts discussions to the songs being played as well as the actual topics being discussed—as if the station turns into a different station with the implicit acknowledgement of the substantial cultural presence and possibly the sonic and financial power of Telugu listeners among the wider Indian community here. There are similar slots dedicated to Gujarati language programming.
In addition to language and topical interests, listening to Bolly92.3FM has been instructive in getting a feel for communal desires, aspirations and anxieties through its advertisement. There are fellow desi real estate agents, tax planners, dentists, travel agencies and coaching centers, each signaling to why they are trustworthy. Some remind their audiences of their shared cultural background as key to them being able to understand their customers’ needs, others also indicate their familiarity with America: one realtor is the only Indian-origin person to feature in a top realtor list, another tax planner’s family has been in the US for three generations and thus he is well aware of the nitty-gritties of transnational estate planning. A known Indian-origin realtor in the area even sponsors his own radio show on the weekends where he takes questions from prospective home buyers and sellers. Before and beyond giving them financial advice, he often explains how fellow Indian immigrants think about financial opportunities, investments, how they might seek social validation from fellow Indians who might see their home and so on. He weaves in such exposition before talking dry financial facts about mortgages. The same tax planner and his co-host occasionally offer historical accounts of changing real estate trends, how certain places used to be affordable for Indian immigrant buyers and how new places are becoming of interest as vacation homes for more affluent Indian immigrants.

Hearing the tax planners and real estate agents plot these dynamic and speculative maps of South Asian financial, cultural and political futures week-after-week felt like witnessing what many historical texts on migration within the Bay Area have described as waves in the past. As I mentioned earlier, Sunnyvale is not the only ‘Little India’ in the Bay Area, let alone in California, but rather, is a more recent iconic place in the Indian and South Asian diaspora map where the younger and newer immigrants are finding homes. Fremont, Milpitas, and Hayward in the East Bay closer to Oakland, and San Jose in the South Bay, saw similar waves of Indian immigrant settlements in the past, many of whom now far more affluent than their younger counterparts. In my short time since moving here, I learned both from conversations with friends who grew up in the area as well as from communication scholar Anne Marie Todd’s work on the past and present of Santa Clara Valley, this region has not only seen waves of migration from settlers across the world but with each incoming wave and turn in occupational trends from farming to railroads to IT work. Multiple communities have remade cities in the Bay Area over the decades.
I also find advertisements as well as talk shows interesting because they offer a more proximate and concentrated triangulation of otherwise scattered, overheard communal talk – I’ve heard things about Indian immigrants and real estate, their aspirations for their children to get into Ivy league colleges. I have also been asked at the grocery store if I know people looking to get married; I’ve overheard aunties at the grocery store consulting each other about dealing with death, financial loss, planetary alignments and more. But it is through the radio that these private interests, anxieties and futures take more collective and articulate forms.
To speak, to be heard, and to be understood as intended are as important as visual representation to allow for feeling in place in the world and for feeling at home in the United States. Very simply put, in the American context, while Black and Brown vocal expression and volume or the stereotype of loudness have been historically stigmatized as a part of the larger racist depiction of ‘unruly’ bodies, various forms of Asian speech, languages and expression including loudness but also silence and the absence of vocality, have also been racialized against the backdrop of white socio-linguistic normativity. Specifically, Indian and South Asian immigrants have been repeatedly represented in popular American culture as muted characters whose interiority is either irrelevant to the plot or cannot be accessed. Think of Raj Koothrappali from the show The Big Bang Theory, an accomplished scientist at the prestigious Caltech university who loses his voice around women. In accounts of Indian tech workers in the US from the Y2K era, one finds multiple articles casting them as the back-office worker – good at laborious and boring work but not presentation material. Some of these stereotypes have changed and splintered as more South Asians technologists have gone on to become successful executives and leaders in US companies, but vocality as a form of publicity can be crucial to sonic forms of belonging.

When I arrived in the Bay Area, I was still trying to figure out how to form a community and how to immerse myself in the desi community here, somewhat selfishly to get a glimpse of how Indian presence is remaking the Valley, not only through IT work but also through cultural and political performances. But I also wanted to get to know people, be known by people around. Olakhita means “people known to us” in Gujarati, my mother tongue; in Hindi jaan-pehchaan means to be in each other’s knowing that gives you some claim and affiliation over others without deep intimacy, not quite acquaintances or neighbors like in the American context. Apart from the numerous Whatsapp and Facebook groups aimed at desifolks living in the same neighborhood or city, Bolly92.3FM acts as a beacon for Indians and South Asians spread across San Francisco, the East Bay and the South Bay areas, offering a medium for the various Indian associations and event organizers to reach out to thousands and invite them to Diwali, Dussehra and other festival celebrations as well as any major concerts by visiting Indian artists in the area.

This is the far from the first or only time that a radio station has facilitated the forging of affective ties and social and material connections in diaspora. Rather this post recounts how active and passive listening to and through the station revealed over time how much South Asian presence has transformed the Bay Area. I attended the massive Diwali and Dussehra events advertised on 92.3, and once there, I could recognize so many of the organizers and sponsors’ names – some of them were the same tax planners and realtors that also run ads and sponsored segments on the station. One week in late September, an ad played on the radio, announcing that a famous hotel in downtown San Jose was now able to accommodate more than a thousand guests and offer a special entryway for the baraat procession (when the groom’s party of a few hundred comes dancing up to the wedding venue). The ad ended with the contact details of a South Asian representative of the hotel who could handle all queries related to Indian wedding arrangements! Bolly92.3FM mediates and shapes the collective desi identity through its programming and advertising, in turn also stitching, materializing and rendering visible a map of the Indian community spread across dozens of non-contiguous cities and neighborhoods in the Bay Area.

It bears noting, however, that the idea of Indianness or desi-ness (an imagined brotherhood among all the expats here) advanced through the radio programs, advertisements as well as the cultural celebrations, is very much a nationalistic and Hindu-dominant one. Although India is an ethnically, linguistically and culturally diverse country and not all people of Indian origin in the Bay Area are Hindu, the radio announcers never really celebrate or discuss Eid, Christmas, or Thanksgiving as events of possible interest. Just based on what is said and what is left unsaid, the dominant self-imagination of the Bay Area desi community as advanced through the radio station feels like it quietly aligns with the dominant religious and political imagination of India as Hindu, middle-class, post-religious and post-caste.
In the process of seeing this map render in my own imagination through regular listening, I also realized how this form of distant listening replicated the mode of jaan-pehchan (getting to know) for the itinerant immigrant-ethnographer. There is always the pre-fieldwork moment, the slightly promiscuous exploratory moment of getting to know and immersing oneself before one can articulate stakes and research questions. It is also often a period of deep uncertainty and ambivalence, since, as ethnographers, responsibility for and towards our field, communities, and interlocutors is always at the center of every project. We are always reminded not to be extractive and to think deeply about power relations even as we attempt to forge meaningful ties with those whom we want to observe and learn from and write about. Much like other visa-workers and international scholars in US academia, being a non-citizen ethnographer engenders multiple kinds of precarities—there is no straightforward or replicable guidebook on how to establish rapport, gain access, build trust and more with a community. More importantly, the itinerant-immigrant ethnographer’s relationships are also always interrupted, prone to arbitrary border restrictions and chronic deracination.
In the early days of digital ethnography, Kate Crawford powerfully argued to reframe acts of lurking (silently and passively hanging out in online communities) as forms of listening and by extension, listening as a concomitant and constitutive practice when we consider participation as speaking or having a voice. In the case of diasporic radio, as I realized, not only is the act of listening quite literal but it also affords and reinforces the vitality of different modes of agentic power and participation, those marked by ambivalence, yet-to-be gained legitimacy; forms of minor participation if you will. Via Crawford, listening and/as lurking also emerges as specifically racially inflected modes of agentic participation against the backdrop of media policy and the emphasis on free expression and speech as the ultimate realization of democratic power.
Among diasporic communities and further among itinerant immigrants within those communities, listening, overhearing and eavesdropping become the de facto modes of democratic and communitarian participation. Listening to the radio as a way of immersion is not a solution to these enduring dilemmas of ethical ethnography but to borrow from the analogy of Californian driving, listening to the radio, just like other forms of digital and analog lurking, allowed me an ‘on-ramp’ to gradually merge and embed myself in the larger South Asian diasporic community.
—
Featured Image: “Driving” by Flickr User AnnaNakami (CC BY-NC 2.0)
—
Noopur Raval is an interdisciplinary researcher interested in understanding global futures of work, the life and work decisions made by immigrant workers in tech companies, changing values and moral norms and projects of personhood especially in postcolonial settings. Noopur received a PhD in Informatics from University of California Irvine (UCI) in September 2020 and, through July 2022 was a postdoc researcher at the AI Now Institute at New York University. Noopur is currently a postdoctoral researcher at UC Santa Cruz – Silicon Valley Extension in the Computational Media department, working with Dr Norman Su. In Fall 2023, Noopur will join the Department of Information Studies at UCLA as an assistant professor.
—

REWIND! . . .If you liked this post, you may also dig:
“Gendered Soundscapes of India, an Introduction“–Monika Mehta and Praseeda Gopinath
The Queer Sound of the Dandiya Queen, Falguni Pathak–-Pavitra Sundar
“Out of Sync: Gendered Location Sound Work in Bollywood“—Priya Jaikumar
SO! LA: Sounding the California Story–Bridget Hoida
“Vous Ecoutez La Voix du Peuple”: The Kreyol Language Pirate Radio Stations of Flatbush, Brooklyn–David Goren
Listening (Loudly) to Spanish-language Radio–Dolores Inés Casillas



















Recent Comments