- by Kathryn Huether
- in Article, artificial intelligence, Capitalism, Cultural Studies, Digital Humanities, Digital Media, Hate & Non-Human Listening Series, Humanism, Identity, immigration and migration, Information, Internets, Language, Listening, Politics, Public Debate, Race, Rhetoric, social media, sonification, Sound, Sound Studies, Speech, Voice
- Leave a comment
Hate & Non-Human Listening, an Introduction

In January 2026, WIRED reported that U.S. Immigration and Customs Enforcement (ICE) has begun using Palantir’s AI tools to process public tip-line submissions. The system does not simply store or relay these reports. It processes English-language submissions, condensing them into what is called a “BLUF”—a “bottom line up front” summary that allows agents to quickly assess and prioritize cases.
Efficiency is the dominant framing as the system promises speed, clarity, and control over overwhelming volumes of information. Yet such efficiency depends on a prior reduction as expression is detached from the conditions of its articulation and reconstituted as data. In this form, listening no longer risks misunderstanding, it eliminates it.
Nor does this infrastructure operate in isolation. It relies on distributed participation in which listening is recast as vigilance. A recent ICE public X (Twitter) post encouraged residents to report “suspicious activity,” assuring them that doing so would make their communities safer.
The language is familiar, even reassuring. But it depends on a prior act of interpretation: that certain voices, presences, or behaviors are already legible as threat. Listening here becomes pre-classification—identifying danger in advance and acting on that identification as if it were already known. Rather than an isolated case, this development signals a broader transformation in how immigration and enforcement are governed. As legal and policy analyses increasingly note, artificial intelligence is becoming “one of the fundamental operating tools of policing,” deployed across domains ranging from speech and text analysis to risk assessment and document verification. Systems such as USCIS’s Evidence Classifier, which tags and prioritizes key documents within case files, and platforms like ImmigrationOS, which aggregate data across agencies to guide enforcement decisions, do not simply process information—they reorganize it. What matters is not only what is said, but whether it aligns—across time, across records, across bureaucratic expectations. Listening becomes continuous and anticipatory, oriented toward detecting inconsistency, deviation, and risk before any claim can be made or contested.
A very different narrative circulates alongside these developments. A recent BBC article suggested that AI chatbots can function as unusually “good listeners”—patient, nonjudgmental, even compassionate. Users describe these systems as offering space for reflection, sometimes preferring them to human interlocutors. Yet what is at work is not attention or relation, but pattern recognition trained to simulate understanding. Taken together, these examples reveal a shared transformation. Across both enforcement systems and everyday interaction, listening is increasingly detached from sensation, exposure, and accountability, becoming a process of extraction and classification rather than relation. As Dorothy Santos argues in her account of speech AI, machines do not simply assist human listening; they assume its position, becoming “the listeners to our sonic landscapes” while also acting as the capturers, surveyors, and documenters of our utterances. What follows from this shift is not just a change in who listens, but in what listening is. Listening no longer names an encounter between subjects; it describes a technical operation distributed across infrastructures that register, store, and act on sound without ever hearing it.
This shift is what I call “nonhuman listening.”
Nonhuman listening names both an infrastructural condition and a set of practices through which listening is reorganized as a technical operation. It describes a mode of perception distributed across systems that capture, process, and act on sound without exposure to it as experience, as well as the procedures—classification, ranking, prediction—through which sound is rendered actionable in advance. At stake is not simply the emergence of new technologies, but a reorganization of what listening has long been understood to do. Listening unfolds across thresholds of perception, attention, and care, shaped by what can be sensed, cultivated, or ignored. From its earliest formulations, it has been understood not as passive reception but as an ethically charged capacity. Aristotle’s distinction between akousis (hearing) and akroasis (listening) marks this divide, reserving listening for forms of attention capable of judgment and response. In this sense, listening has always named both openness and control: a posture of receptivity toward others and a way of organizing the world.
Nonhuman listening amplifies an older logic: not all voices are heard, and not all forms of speech register as meaning and listening does not begin from neutrality. Norms organize it in advance, determining what registers as signal, who gets to hear, and whose speech counts as intelligible. Meaning and noise do not inhere in sound itself; they emerge through historically sedimented expectations about voice, difference, and belonging.
Sound studies has long challenged the assumption that listening inherently connects or humanizes. Listening does not operate as an immediate or intimate relation; it relies on frameworks that precondition perception. Jonathan Sterne shows that claims about sonic immediacy function less as empirical truths than as ideological formations—narratives that naturalize particular social arrangements while obscuring how listening renders some forms of speech legible and others unintelligible. Listening does not simply receive the world—it organizes it.
At the same time, theoretical and experimental approaches foreground the instability of this organization. Voices do not exist as stable entities prior to their mediation; they “show up as real,” as Matt Rahaim writes, through specific practices and infrastructures that render them intelligible, contested, or indeterminate. Jean-Luc Nancy conceptualizes listening as resonance, emphasizing exposure—the possibility that listening might unsettle the subject—while also underscoring that such openness never distributes evenly. John Cage and Pauline Oliveros treat listening as a disciplined practice that requires cultivation and can fail as easily as it attunes. Listening is not given; it is trained.

Across these accounts, listening operates within regimes of power. Jacques Attali locates listening within governance, where institutions determine what can be heard, what must be silenced, and what becomes disposable. Trauma and memory studies intensify these stakes. Henry Greenspan shows that listening to testimony never occurs as a singular or sufficient act, and that extractive modes of attention can reproduce violence rather than alleviate it. Ralina L. Joseph’s concept of radical listening reframes listening as an ethical orientation—one that demands accountability to power, difference, and fatigue, and that attends to how speakers wish to be heard. As she writes, “the easiest way to refuse to listen is to keep talking.”
Taken together, these accounts point to a more difficult claim: listening is not simply uneven—it is directional. It can orient toward exposure and relation, or toward certainty and verification. When listening turns toward certainty, it no longer encounters speech as an address. It apprehends it in advance while certain voices register not as claims or appeals, but as warnings or threats.
Such orientation has precedents that are neither abstract nor metaphorical. During the 1937 Parsley Massacre, Dominican soldiers used pronunciation as a test of belonging. Suspected Haitians were asked to say the word perejil (parsley); those whose speech did not conform to expected phonetic norms were identified as foreign and often killed. Listening here did not register meaning or intent. It functioned as classification—reducing speech to a signal of difference and acting on that difference as if it were already known.
This logic persists in contemporary enforcement practices, albeit in different registers. Recent encounters with U.S. immigration agents reveal how accent continues to operate as a proxy for suspicion and a trigger for intervention. In multiple reported incidents, individuals have been stopped or detained and asked to account for their citizenship on the basis of how they sound: “Because of your accent,” one agent stated when asked to justify the demand for documentation . In another case, an agent explicitly linked auditory difference to disbelief, telling a driver, “I can hear you don’t have the same accent as me,” before repeatedly questioning where he was born.
In these moments, listening again operates as pre-classification. Accent is not heard as variation, history, or movement, but as evidence—an audible marker of non-belonging that precedes and justifies further scrutiny. What is at stake is not mishearing, but a mode of listening trained to stabilize difference as risk. Speech becomes legible only insofar as it confirms or disrupts an already established expectation of who belongs.
Early analyses of digital surveillance anticipated a more radical transformation than they could yet fully name. Writing in 2014, Robin James identified an emerging “acousmatic” condition in which listening detaches from any identifiable listener and disperses across systems of data capture and analysis. The 2013 Snowden disclosures make clear that this shift was not theoretical but already operational. State surveillance had moved from targeted interception to total capture, amassing communications indiscriminately and deriving “suspicion” only after the fact, as a pattern extracted from within the dataset itself. Listening no longer responds to a known object; it produces the object it claims to detect. What registers as “suspicious” does not precede analysis but materializes through algorithmic filtering, where signal and noise become effects of the system’s design rather than properties of the world. Under these conditions, listening ceases to function as a sensory or interpretive act and instead operates as an infrastructural logic of sorting, ranking, and preemption. Contemporary platforms extend and normalize this logic. They do not hear sound; they process it, rendering it actionable without ever encountering it as experience.

The essays collected in this series extend this transformation across distinct but interconnected domains, tracing how nonhuman listening operates through sound, speech, and platformed media. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. Next week, Olga Zaitseva-Herz situates these dynamics within the context of digital warfare, where AI-generated voices, deepfakes, and synthetic media circulate as instruments of psychological manipulation, designed to provoke affective responses that travel faster than verification.
Contemporary speech technologies make this continuity visible at the level of language itself. As work in the Racial Bias in Speech AI series shows, particularly as Michelle Pfeifer demonstrates, speech technologies do not simply fail to recognize certain speakers; they formalize assumptions about what counts as intelligible language in the first place. In these systems, the voice is not encountered as expression but as input—something to be parsed, categorized, and aligned with existing datasets. When AI systems encounter African American Vernacular English—especially emergent idioms shaped by Black and queer communities—language is flattened into surface definitions, stripped of cultural grounding, or flagged as inappropriate. Speech is not heard as situated expressions; it is processed as deviation from an unmarked norm.
What emerges is a form of hostile listening: not the misrecognition of a human listener, but a condition in which recognition is structurally focused. Racialized language becomes perpetually at risk–mistrusted or excluded–not because it fails to communicate but because it exceeds the parameters through which the system can register meaning. Hate here is not expressive or intentional; it is procedural, embedded in the standards that determine what can be heard as language at all.
In this sense, the problem is not that listening has been replaced. It is that it continues—without exposure, without relation, without consequence for those who perform it. What appears as neutrality is the absence of risk. What appears as efficiency is the removal of encounters. Under these conditions, harm does not need to be spoken. It is heard into being in advance—stabilized as signal, confirmed as threat, and acted upon before it can be contested. The question that remains is not whether machines can learn to listen better. It is whether we can still recognize listening once it no longer requires us at all.
—
Kathryn Agnes Huether is a Postdoctoral Research Associate in Antisemitism Studies at UCLA’s Initiative to Study Hate and the Alan D. Leve Center for Jewish Studies. She earned her PhD in musicology with a minor in cultural studies from the University of Minnesota (2021) and holds a second master’s in religious studies from the University of Colorado Boulder. She has held visiting appointments at Bowdoin College and Vanderbilt University and was the 2021–2022 Mandel Center Postdoctoral Fellow at the United States Holocaust Memorial Museum.
Her research examines how sound mediates Holocaust memory, antisemitism, racial violence, and contemporary politics. She has published in Sound Studies and Yuval, has forthcoming work in the Journal of the Society for American Music and Music and Politics. She is a member of the Holocaust Educational Foundation of Northwestern University’s (HEFNU) Virtual Speakers Bureau and has been an invited educator at two of its regional institutes, and is current editor of ISH’s public-facing blog. Her first book, Sounding Hate: Sonic Politics in the Age of Platforms and AI, is in progress. Her second, Sounding the Holocaust in Film, is a forthcoming teaching compendium that brings together key concepts in Holocaust studies with methods from film music and sound studies.
—
Series Icon designed by Alex Calovi
—

REWIND! . . .If you liked this post, you may also dig:
Your Voice is (Not) Your Passport—Michelle Pfeifer
“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens
Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari
Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso
Stir It Up: From Polyphony to Multivocality in A Brief History of Seven Killings
For many, the audiobook is a source of pleasure and distraction, a way to get through the To Read Pile while washing dishes or commuting. Audiobooks have a stealthy way of rendering invisible the labor of creating this aural experience: the writer, the narrator, the producer, the technology…here at Sounding Out! we want to render that labor visible and, moreover, think of the sound as a focus of analysis in itself.
Over the next few weeks, we will host several authors who will make all of us think differently about the audiobook selections on our phone, in our car, and in our radios. Last week we listened to a book that listens to Dublin, in a post by Shantam Goyal. Today we have seven narrators telling us the story of an assassination attempt on Bob Marley. What will the audiobook whisper to us that the book cannot speak?
—Managing Editor Liana Silva
Reviews of A Brief History of Seven Killings, Marlon James’ 686-page rendering of the echoes of an assassination attempt on Bob Marley, almost invariably invoke the concept of polyphony to name its adroit use of multiple narrators. In The New York Times, Zachary Lazar maintained that the “polyphony and scope” of the 2014 novel made it much more than a saga of drug and gang violence stretching from 1970s Kingston to 1990s New York. And the Booker Prize, which James was the first Jamaican to win, similarly praised it as a “rich, polyphonic study,” with chief judge Michael Wood calling attention to the impressive “range of voices and registers, running from the patois of the street posse to The Book of Revelation.” It was thus not only the sheer number of voices in a preliminary three-page “Cast of Characters” that critics so unanimously admired but also the variety and nuance evident within them. Norwegian publisher Mime Books even took these polyphonic features a step further by hiring not one but twelve translators in a casting process that auditioned prominent novelists, playwrights, and performers.
James recalls realizing early on that this novel would be one “driven only by voice” (687), which might make such enthusiastic responses to its plurality of perspectives seem unsurprising. But what happens when such polyphony leaves the page behind and actual material voices drive its delivery? If the audiobook is a format of the novel (and here I follow Jonathan Sterne’s definition of format in MP3: The Meaning of a Format as “a whole range of decisions that affect the look, feel, experience, and workings of a medium” [7]), what lessons can listeners learn that print cannot provide? As I argue, the 26-hour-long audiobook version of A Brief History, which Highbridge Audio produced with seven actors (Robertson Dean, Cherise Boothe, Dwight Bacquie, Ryan Anderson, Johnathan McClain, Robert Younis, Thom Rivera), allows us to engage with multivocality rather than polyphony, which is to say the multiple vocal performances of a single individual rather than the presence of many narrators within a print work. And just as this novel’s polyphonic structure destabilizes any attempt at a definitive account of the events it portrays, the multifaceted performances of its audio format work to untrain ears that have been conditioned to hear necessary ties between voices and bodies.
Of course, this effect is not one that most listeners consciously seek, as reviews of the audiobook articulating various reasons for turning to this format as well as diverging responses to it readily attest. Gayle, on Audible, began with the print version: “but as soon as I got to the first chapter that was written in Jamaican patois I knew that I was not able to do that in my head and I was going to miss a lot.” Sound here conveys sense more swiftly than the page, the ear apparently better suited than the eye to encounter difference. (Woodsy, another reviewer, even felt emboldened to ventriloquize in text that sonically distinctive speech: “I found that listening to the Audible version was helpful. Now all me need do is stop thinking in Jamaican.”) Yet it was Andre who offered by far the most memorable characterization of the audiobook and its affordances. As he explained, in James’ novel “the language is a thick, tropical forest of words. Audiobook is the machete that slices through this forest of words so I can enjoy the treasures inside.” The violence of this metaphor matches that of the novel’s most disturbing scenes, yet what is most striking is the way it reiterates once more how reviewers found it easier to access the work aurally rather than visually.
These reviews, and other similarly favorable appraisals, rarely consider the audiobook on its own terms, insisting instead on comparisons with the text. Negative ones, however, often note distinctively sonic features, with some reviewers echoing one of the Booker judges—who reportedly consulted a Jamaican poet about the accuracy of James’ ear for dialogue—by questioning the veracity of the Jamaican accents in a novel that also features American, Colombian, and Cuban ones. Tending to readily identify themselves as Jamaican, these writers and listeners rarely acknowledge that at least some of the actors were born on the island when asserting that the accents are off. In any case, such efforts to link sound and authenticity, as Liana Silva has argued with respect to the audiobook, wrongly suggest that those who belong to a group must conform to a single sound. James, too, distrusts discourses of the authentic, as characters repeatedly cast suspicion and scorn on anyone uttering the phrase “real Jamaica.”
If the polyphony in James’ novel prevents any one perspective from becoming either representative or definitive, the audiobook pushes this process even further by demonstrating how a single performer’s voice can possess such range that it seems to contain multiple ones. Each performer is responsible for all the voices within the sections narrated by their primary characters, which means that the same character can occasionally be voiced by different actors. In one section, a performer does the voices of a tough-talking Chicago-born hitman and the jittery Colombians he speaks with in Miami; in others, that same performer is both a white Rolling Stone journalist from Minnesota who’s attuned to racial difference and the black Jamaicans he converses with in Kingston. Continuity or strict one-to-one correspondences between performer and character ultimately matter less than the displays of vocal difference that allow the audiobook to contest essentialized notions of voice.
As a result, the audiobook articulates just how constructed vocal divisions based on race, gender, and class are by having its performers constantly cross them. It amplifies the very arbitrariness of such divisions and thereby reveals how, if the page is the space of polyphony, then what the audiobook stages is multivocality. Although they might seem like synonyms, these two terms can actually help us appreciate crucial differences and, in doing so, highlight the specificity of the audio format. On the one hand, –phony or phōnē, as Shane Butler reminds us in The Ancient Phonograph, ambiguously refers to both voice and the human capacity for speech (36), whereas –vocality centers the voice. On the other, the shift from the Greek poly- to the Latin multi- signals a contrast in what gets counted: while polyphony names the quantity of perspectives contributing to a narrative (when introducing it in Problems of Dostoevsky’s Poetics, Mikhail Bakhtin emphasized that polyphony consisted of “a plurality of independent and unmerged voices and consciousnesses” [6]), multivocality instead specifies how the number of voices can exceed the number of performers. In this way, the concept of multivocality outlined here with respect to the audiobook resonates with its use in another context by Katherine Meizel, who mobilizes it with reference to singing and the borders of identity. In both cases, voice names a multiplicity of practices rather than an immutable or inevitable expression, which in turn aligns with Nina Sun Eidsheim’s argument in The Race of Sound about the voice being not singular but collective and not innate but cultural (9).
We can therefore say that where print-based polyphony works on the eye by placing various perspectives on a page without necessarily challenging visual perceptions of difference, multivocality in the audiobook can retrain an ear’s culturally ingrained ideas about voice. James himself has experience with these seemingly inescapable meanings assigned to vocal sounds. In a moving essay for The New York Times Magazine, he recounts how, even at the age of 28, “I was so convinced that my voice outed me as a fag that I had stopped speaking to people I didn’t know.” That was already long after high school, when, as he remembered in a New Yorker profile, he had begun “tape-recording his efforts to sound masculine, repeating words like ‘bredren’ and ‘boss.’” He was well aware of the links that listeners created between voice and identity and that could, as he suggests, prove risky in a place with overt homophobia like Jamaica. Writing, however, offered him a space to take on any voice and, at the same time, not be concerned with the sound of his own.
Yet if the page allowed James to effortlessly shift among narrative voices, the audiobook format exhibits voices that ostensibly shift without any effort. Perhaps the most compelling example emerges in the work of Cherise Boothe, whose performance of the novel’s sole female primary character presents the voices of other figures as well. Toward the end of the novel, this character, Dorcas Palmer, is a caretaker for a much older and wealthier white man with amnesia in New York. Boothe not only captures the changes as Palmer often eliminates her Jamaican accent and occasionally lets it loose but also registers the man’s moments of lucidity and confusion. Even if, as listeners, we understand that Boothe is the voice behind both of these characters, the two vocal performances are so distinct that they effectively erode the basis for any beliefs about how a certain body should sound.
Adopting different voices is certainly not unique to the audiobook, but it does provide one of the few forms of extended exposure to this practice. Yet it is worth noting that A Brief History markedly differs from the model of a more extensive cast like the one comprised of 166 voices that recorded George Saunders’ Lincoln in the Bardo. By assigning a performer to every character, such productions ultimately emphasize vocal uniqueness in roughly the same way that Adriana Cavarero conceives it, namely as an index of individuality. But there the voice remains something singular or somehow essential, for there is no opportunity to perform the plurality that appears across A Brief History. At the same time, the use of seven actors also offers a contrast with the opposite extreme: a single performer responsible for all the roles, which demonstrates multivocality but does so on such a small scale that it feels exceptional instead of ordinary. The middle ground, which is to say the model found in A Brief History, allows us to hear multiple instances of how the voice is entrained rather than essential, possibility rather than inevitability.

Screenshot from Youtube video “Marlon James: A Brief History of Seven Killings” by Chicago Humanities Festival
When briefly addressing audiobooks in an interview, James remarked that this format possesses a distinct advantage: “even something that is not necessarily plain can be translated because of tone and symbol and voice.” In other words, a voice can register its changing surroundings; conveying these subtle transformations on the page, however, is often far more difficult. This shortcoming is one that Edward Kamau Brathwaithe once memorably described when explaining why he insisted on using a tape recorder in a lecture on language in the Caribbean: “I want you to get the sound of it, rather than the sight of it.” The idiomatic familiarity of the first half, which clashes so sharply with the awkwardness of the second, suggests that the multivocality of an audiobook can open ears by accentuating how the voice is not fixed but in constant formation.
—
Featured Image: “Audiobook” by Flickr user ActuaLitte, CC-BY-SA-2.0
—
Sam Carter is a PhD Candidate in Romance Studies at Cornell University. His work on literature and sound in the Southern Cone has appeared in Latin American Textualities: History, Materiality, and Digital Media and is forthcoming in the Revista Hispánica Moderna.
—
REWIND! . . .If you liked this post, you may also dig:
SO! Reads: Jonathan Sterne’s MP3: The Meaning of a Format–Aaron Trammell
Radio de Acción: Violent Circuits, Contentious Voices: Caribbean Radio Histories–Alejandra Bronfman
“Scenes of Subjection: Women’s Voices Narrating Black Death“–Julie Beth Napolin
ISSN 2333-0309
Translate
Recent Posts
- Hate & Non-Human Listening, an Introduction
- The Absurdity and Authoritarianism of Now: My Chemical Romance’s The Black Parade Resonates Queerly, Anew
- SO! Reads: Alexis McGee’s From Blues To Beyoncé: A Century of Black Women’s Generational Sonic Rhetorics
- SO! Reads: Justin Eckstein’s Sound Tactics: Auditory Power in Political Protests
- Impaulsive: Bro-casting Trump, Part I
Archives
Categories
Search for topics. . .
Looking for a Specific Post or Author?
Click here for the SOUNDING OUT INDEX. . .all posts and podcasts since 2009, scrollable by author, date, and title. Updated every 5 minutes.






















Recent Comments