- by Kathryn Huether
- in Article, artificial intelligence, Capitalism, Cultural Studies, Digital Humanities, Digital Media, Hate & Non-Human Listening Series, Humanism, Identity, immigration and migration, Information, Internets, Language, Listening, Politics, Public Debate, Race, Rhetoric, social media, sonification, Sound, sound studies, Speech, Voice
- 2 Comments
Hate & Non-Human Listening, an Introduction

In January 2026, WIRED reported that U.S. Immigration and Customs Enforcement (ICE) has begun using Palantir’s AI tools to process public tip-line submissions. The system does not simply store or relay these reports. It processes English-language submissions, condensing them into what is called a “BLUF”—a “bottom line up front” summary that allows agents to quickly assess and prioritize cases.
Efficiency is the dominant framing as the system promises speed, clarity, and control over overwhelming volumes of information. Yet such efficiency depends on a prior reduction as expression is detached from the conditions of its articulation and reconstituted as data. In this form, listening no longer risks misunderstanding, it eliminates it.
Nor does this infrastructure operate in isolation. It relies on distributed participation in which listening is recast as vigilance. A recent ICE public X (Twitter) post encouraged residents to report “suspicious activity,” assuring them that doing so would make their communities safer.
The language is familiar, even reassuring. But it depends on a prior act of interpretation: that certain voices, presences, or behaviors are already legible as threat. Listening here becomes pre-classification—identifying danger in advance and acting on that identification as if it were already known. Rather than an isolated case, this development signals a broader transformation in how immigration and enforcement are governed. As legal and policy analyses increasingly note, artificial intelligence is becoming “one of the fundamental operating tools of policing,” deployed across domains ranging from speech and text analysis to risk assessment and document verification. Systems such as USCIS’s Evidence Classifier, which tags and prioritizes key documents within case files, and platforms like ImmigrationOS, which aggregate data across agencies to guide enforcement decisions, do not simply process information—they reorganize it. What matters is not only what is said, but whether it aligns—across time, across records, across bureaucratic expectations. Listening becomes continuous and anticipatory, oriented toward detecting inconsistency, deviation, and risk before any claim can be made or contested.
A very different narrative circulates alongside these developments. A recent BBC article suggested that AI chatbots can function as unusually “good listeners”—patient, nonjudgmental, even compassionate. Users describe these systems as offering space for reflection, sometimes preferring them to human interlocutors. Yet what is at work is not attention or relation, but pattern recognition trained to simulate understanding. Taken together, these examples reveal a shared transformation. Across both enforcement systems and everyday interaction, listening is increasingly detached from sensation, exposure, and accountability, becoming a process of extraction and classification rather than relation. As Dorothy Santos argues in her account of speech AI, machines do not simply assist human listening; they assume its position, becoming “the listeners to our sonic landscapes” while also acting as the capturers, surveyors, and documenters of our utterances. What follows from this shift is not just a change in who listens, but in what listening is. Listening no longer names an encounter between subjects; it describes a technical operation distributed across infrastructures that register, store, and act on sound without ever hearing it.
This shift is what I call “nonhuman listening.”
Nonhuman listening names both an infrastructural condition and a set of practices through which listening is reorganized as a technical operation. It describes a mode of perception distributed across systems that capture, process, and act on sound without exposure to it as experience, as well as the procedures—classification, ranking, prediction—through which sound is rendered actionable in advance. At stake is not simply the emergence of new technologies, but a reorganization of what listening has long been understood to do. Listening unfolds across thresholds of perception, attention, and care, shaped by what can be sensed, cultivated, or ignored. From its earliest formulations, it has been understood not as passive reception but as an ethically charged capacity. Aristotle’s distinction between akousis (hearing) and akroasis (listening) marks this divide, reserving listening for forms of attention capable of judgment and response. In this sense, listening has always named both openness and control: a posture of receptivity toward others and a way of organizing the world.
Nonhuman listening amplifies an older logic: not all voices are heard, and not all forms of speech register as meaning and listening does not begin from neutrality. Norms organize it in advance, determining what registers as signal, who gets to hear, and whose speech counts as intelligible. Meaning and noise do not inhere in sound itself; they emerge through historically sedimented expectations about voice, difference, and belonging.
Sound studies has long challenged the assumption that listening inherently connects or humanizes. Listening does not operate as an immediate or intimate relation; it relies on frameworks that precondition perception. Jonathan Sterne shows that claims about sonic immediacy function less as empirical truths than as ideological formations—narratives that naturalize particular social arrangements while obscuring how listening renders some forms of speech legible and others unintelligible. Listening does not simply receive the world—it organizes it.
At the same time, theoretical and experimental approaches foreground the instability of this organization. Voices do not exist as stable entities prior to their mediation; they “show up as real,” as Matt Rahaim writes, through specific practices and infrastructures that render them intelligible, contested, or indeterminate. Jean-Luc Nancy conceptualizes listening as resonance, emphasizing exposure—the possibility that listening might unsettle the subject—while also underscoring that such openness never distributes evenly. John Cage and Pauline Oliveros treat listening as a disciplined practice that requires cultivation and can fail as easily as it attunes. Listening is not given; it is trained.

Across these accounts, listening operates within regimes of power. Jacques Attali locates listening within governance, where institutions determine what can be heard, what must be silenced, and what becomes disposable. Trauma and memory studies intensify these stakes. Henry Greenspan shows that listening to testimony never occurs as a singular or sufficient act, and that extractive modes of attention can reproduce violence rather than alleviate it. Ralina L. Joseph’s concept of radical listening reframes listening as an ethical orientation—one that demands accountability to power, difference, and fatigue, and that attends to how speakers wish to be heard. As she writes, “the easiest way to refuse to listen is to keep talking.”
Taken together, these accounts point to a more difficult claim: listening is not simply uneven—it is directional. It can orient toward exposure and relation, or toward certainty and verification. When listening turns toward certainty, it no longer encounters speech as an address. It apprehends it in advance while certain voices register not as claims or appeals, but as warnings or threats.
Such orientation has precedents that are neither abstract nor metaphorical. During the 1937 Parsley Massacre, Dominican soldiers used pronunciation as a test of belonging. Suspected Haitians were asked to say the word perejil (parsley); those whose speech did not conform to expected phonetic norms were identified as foreign and often killed. Listening here did not register meaning or intent. It functioned as classification—reducing speech to a signal of difference and acting on that difference as if it were already known.
This logic persists in contemporary enforcement practices, albeit in different registers. Recent encounters with U.S. immigration agents reveal how accent continues to operate as a proxy for suspicion and a trigger for intervention. In multiple reported incidents, individuals have been stopped or detained and asked to account for their citizenship on the basis of how they sound: “Because of your accent,” one agent stated when asked to justify the demand for documentation . In another case, an agent explicitly linked auditory difference to disbelief, telling a driver, “I can hear you don’t have the same accent as me,” before repeatedly questioning where he was born.
In these moments, listening again operates as pre-classification. Accent is not heard as variation, history, or movement, but as evidence—an audible marker of non-belonging that precedes and justifies further scrutiny. What is at stake is not mishearing, but a mode of listening trained to stabilize difference as risk. Speech becomes legible only insofar as it confirms or disrupts an already established expectation of who belongs.
Early analyses of digital surveillance anticipated a more radical transformation than they could yet fully name. Writing in 2014, Robin James identified an emerging “acousmatic” condition in which listening detaches from any identifiable listener and disperses across systems of data capture and analysis. The 2013 Snowden disclosures make clear that this shift was not theoretical but already operational. State surveillance had moved from targeted interception to total capture, amassing communications indiscriminately and deriving “suspicion” only after the fact, as a pattern extracted from within the dataset itself. Listening no longer responds to a known object; it produces the object it claims to detect. What registers as “suspicious” does not precede analysis but materializes through algorithmic filtering, where signal and noise become effects of the system’s design rather than properties of the world. Under these conditions, listening ceases to function as a sensory or interpretive act and instead operates as an infrastructural logic of sorting, ranking, and preemption. Contemporary platforms extend and normalize this logic. They do not hear sound; they process it, rendering it actionable without ever encountering it as experience.

The essays collected in this series extend this transformation across distinct but interconnected domains, tracing how nonhuman listening operates through sound, speech, and platformed media. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. Next week, Olga Zaitseva-Herz situates these dynamics within the context of digital warfare, where AI-generated voices, deepfakes, and synthetic media circulate as instruments of psychological manipulation, designed to provoke affective responses that travel faster than verification.
Contemporary speech technologies make this continuity visible at the level of language itself. As work in the Racial Bias in Speech AI series shows, particularly as Michelle Pfeifer demonstrates, speech technologies do not simply fail to recognize certain speakers; they formalize assumptions about what counts as intelligible language in the first place. In these systems, the voice is not encountered as expression but as input—something to be parsed, categorized, and aligned with existing datasets. When AI systems encounter African American Vernacular English—especially emergent idioms shaped by Black and queer communities—language is flattened into surface definitions, stripped of cultural grounding, or flagged as inappropriate. Speech is not heard as situated expressions; it is processed as deviation from an unmarked norm.
What emerges is a form of hostile listening: not the misrecognition of a human listener, but a condition in which recognition is structurally focused. Racialized language becomes perpetually at risk–mistrusted or excluded–not because it fails to communicate but because it exceeds the parameters through which the system can register meaning. Hate here is not expressive or intentional; it is procedural, embedded in the standards that determine what can be heard as language at all.
In this sense, the problem is not that listening has been replaced. It is that it continues—without exposure, without relation, without consequence for those who perform it. What appears as neutrality is the absence of risk. What appears as efficiency is the removal of encounters. Under these conditions, harm does not need to be spoken. It is heard into being in advance—stabilized as signal, confirmed as threat, and acted upon before it can be contested. The question that remains is not whether machines can learn to listen better. It is whether we can still recognize listening once it no longer requires us at all.
—
Kathryn Agnes Huether is a Postdoctoral Research Associate in Antisemitism Studies at UCLA’s Initiative to Study Hate and the Alan D. Leve Center for Jewish Studies. She earned her PhD in musicology with a minor in cultural studies from the University of Minnesota (2021) and holds a second master’s in religious studies from the University of Colorado Boulder. She has held visiting appointments at Bowdoin College and Vanderbilt University and was the 2021–2022 Mandel Center Postdoctoral Fellow at the United States Holocaust Memorial Museum.
Her research examines how sound mediates Holocaust memory, antisemitism, racial violence, and contemporary politics. She has published in Sound Studies and Yuval, has forthcoming work in the Journal of the Society for American Music and Music and Politics. She is a member of the Holocaust Educational Foundation of Northwestern University’s (HEFNU) Virtual Speakers Bureau and has been an invited educator at two of its regional institutes, and is current editor of ISH’s public-facing blog. Her first book, Sounding Hate: Sonic Politics in the Age of Platforms and AI, is in progress. Her second, Sounding the Holocaust in Film, is a forthcoming teaching compendium that brings together key concepts in Holocaust studies with methods from film music and sound studies.
—
Series Icon designed by Alex Calovi
—

REWIND! . . .If you liked this post, you may also dig:
Your Voice is (Not) Your Passport—Michelle Pfeifer
“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens
Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari
Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso
Press Play and Lean Back: Passive Listening and Platform Power on Nintendo’s Music Streaming Service
I remember long car rides as a kid in the early 2000s, headphones on, gazing out the window at the passing scenery while looping background music from The Legend of Zelda and Pokémon games on my Game Boy. After school, I’d occasionally throw the Super Smash Bros. Melee soundtrack on my Discman CD player, keeping me motivated while doing homework. Like many others, I found Nintendo’s music to be an effective accompaniment to everyday activities, a kind of functional listening long before streaming platforms like Spotify and YouTube made it trendy. Which raises the question: how has Nintendo adapted to the streaming age?
Unlike many other game publishers, Nintendo has conspicuously kept its music off streaming services—despite having some of the most recognizable soundtracks in video game history, such as Super Mario Bros., Donkey Kong, and Metroid. Instead, the company took a different direction by unveiling its own music streaming service in October 2024, aptly titled Nintendo Music. The platform, available to Nintendo Switch Online subscribers, showcases soundtracks spanning the company’s history, from 1980s NES titles to recent Nintendo Switch 2 releases.
In a listening landscape dominated by Spotify, Apple Music, and YouTube Music, Nintendo’s decision to launch its own proprietary streaming service makes it unique among video game companies. This move is idiosyncratic in a way that feels characteristically Nintendo, but it is also a bold bid to compete in the broader attention economy. By situating itself alongside, rather than within, the major music streaming services, Nintendo signals that its soundtracks are valuable cultural content worth curating and controlling directly.
Nintendo Music caters specifically to video game fans by including screenshots with each track, having a “Spoiler” filter that lets users block music from games they haven’t played, and making personalized recommendations based on each user’s play history. But perhaps most notable is its emphasis on background listening: through features like mood playlists and an “Extend” tool, video game music is explicitly framed as a companion for contexts like relaxing, working out, or doing household chores.
By repurposing game soundtracks as tools for everyday routines, Nintendo Music capitalizes on nostalgia and contemporary listening habits to deepen fan engagement and retain control over its brand—a strategic move from a company that is famously (over)protective of its intellectual property. More generally, it also reflects neoliberal logics in which music is woven into daily life to regulate mood and productivity, revealing the increasing reach of digital platforms over how we work, listen, and live.
Listening in Loops: Video Game Music in the Background
In advertisements for Nintendo Music, actors hum and sing along to famous video game tunes while carrying out their daily activities. “Whether you’re grocery shopping, straightening up at home, or getting some studying done, Nintendo Music can be the background sound to your everyday life,” the description to one video reads.
This marketing is strikingly similar to strategies by streaming services such as Spotify, which encourage listening to music in any and every context. Playlists based around specific moods or activities—like Spotify’s “Gym Hits,” “Intense Studying,” and “sad girl starter pack”—use music as a tool to manage listeners’ energy levels, focus, and emotions as they go about their lives. Anahid Kassabian’s concept of “ubiquitous listening” helps describe this phenomenon, showing how even passive, background engagement can shape listeners’ affects and experiences.
In many ways, video game music is ideal for the ubiquitous listening that streaming services promote. Game soundtracks are generally (though not always) designed for the background and are usually instrumental, setting the emotional tone of on-screen action, from serene soundscapes to intense boss battles. Unlike other multimedia soundtracks, such as film scores, much video game music is also composed to loop indefinitely, making it especially effective for sustained listening.
As Michiel Kamp demonstrates in Four Ways of Hearing Video Game Music, “background listening” is one of the main ways users experience video game soundtracks. He writes that “background music both in games and elsewhere requires us to be so attuned to it that it offers no experiential friction in need of interpreting, and through this it has the capacity to attune us to our environment, be it a mythical underworld full of dangers or a convenience store full of groceries” (2024, 175).
While Kamp primarily focuses on background listening while playing games, game music can attune listeners to moods, activities, or environments even when heard outside of gameplay. In fact, video games train us to listen in this way, using music to establish the appropriate affect for narrative events, settings, and characters. These immersive qualities have made video game music immensely popular on streaming services: soundtracks from games and franchises like Halo, Final Fantasy, The Elder Scrolls, Undertale, and Minecraft have collectively garnered over a billion streams on Spotify alone.
But Nintendo, by launching its own proprietary platform, trades streaming royalties and wider exposure for something arguably more valuable: the ability to control how and where fans experience its content.
Features in Focus: Nintendo Music’s Approach to Passive Listening
Nintendo Music’s features illustrate how the service adapts soundtracks for continuous, everyday listening. Perhaps most notable is the service’s unique Extend feature, which allows users to stretch the runtime of tracks up to 60 minutes. Described in the app as “the perfect accompaniment to studying or working,” this feature facilitates seamless background listening without the distraction of frequent track changes. So if you’ve ever wanted to loop the Wii Shop music for a full hour—and let’s be honest, who hasn’t—now you can.


Alongside complete soundtracks, Nintendo Music also foregrounds curated playlists, including those based around specific video game characters, themes, and moods. The “Powering Up” playlist features “up-tempo tracks to fill you with energy,” for example, while “Good Night” has “down-tempo tracks to help you drift into dreamland.” Screenshots for each track further immerse listeners, visually reinforcing the moods and environments the music is designed to evoke. On these playlists, Nintendo’s music is presented less as individual compositions and more as “vibes.”

Packaging music around moods or vibes is not a neutral act. In Mood Machine: The Rise of Spotify and the Costs of the Perfect Playlist, Liz Pelly asserts that “organizing music by mood is a way to transform it into a new type of media product. It is about selling users not just on moods, but on the promise of the very concept that mood stabilization is something within their control. It’s a tactic for luring users to double click and start streaming” (2025, 40). Pelly’s observation underscores that mood-based playlists do more than entertain: they are a way for platforms to influence how listeners organize their time and attention.
Furthermore, Nintendo Music’s approach positions music not only as a creative or cultural artifact, but also as a commodified resource for self-regulation. This aligns with Eric Drott’s claim that streaming services often employ music as a “technology of social reproduction,” used to structure and maintain day-to-day existence. For Drott, this is “part of a broader tendency under neoliberal capitalism that prizes music, the arts, and culture not on account of their aesthetic worth but on account of their ‘expediency’ for other social, political, and economic ends” (2024, 197).
Many users still actively listen to their favourite Nintendo soundtracks on the platform, and there’s also nothing inherently wrong with background listening—it’s how much of this music was originally designed to be heard. However, presenting music as an aid to concentration, productivity, or mood regulation also risks repurposing soundtracks as a form of “neo-Muzak,” a vehicle for continuous consumption designed to keep listeners plugged into Nintendo’s broader product ecosystem.
Background Benefits: Nintendo’s Platform Power
Beyond guiding listening habits, Nintendo Music reinforces the company’s brand image of nostalgia, innovation, and family-friendly fun while increasing engagement with its intellectual property on its own terms. As a Nintendo spokesperson said in an interview with Nippon TV News, “To increase the number of people who have access to Nintendo IP, we believe that game music is an important and valuable form of content. Nintendo Music is a service that allows us to deliver this game music in a way that is uniquely Nintendo. . . . We hope that Nintendo Music will help you recall some of your favorite gaming experiences and think that it will also encourage people to play the games again” (translation by Nicholas Anderson).
Nintendo’s efforts to centralize its music are also likely, at least in part, a response to fans unofficially circulating soundtracks online. As part of a broader trend of functional music compilations (think lofi beats to study/relax to), YouTube hosts countless user-generated Nintendo music playlists designed for activities such as studying and sleeping. Despite Nintendo’s notoriety for issuing takedown notices over copyright infringement—including shutting down the massively popular YouTube video game music channel GilvaSunner in 2022—many of these unofficial videos and reuploads continue to accrue millions of views.
By providing an official home for soundtracks and its own contextual playlists, Nintendo Music is a subtle exercise in platform power, gating access to subscribers. It redirects listeners from other platforms, letting Nintendo control its content without diluting its brand on third-party services. Although Nintendo Music’s catalogue is currently slim—as of writing it has roughly 100 soundtracks—the company continues to trickle out new music most weeks, incentivizing listeners to keep coming back.
Nintendo Music promotes ongoing background listening not only to attract users who are already accustomed to mood and activity playlists, then, but also to keep them on the platform and connected to the company’s games and services. After all, every minute a listener spends on Nintendo Music looping David Wise’s “Aquatic Ambiance” from Donkey Kong Country is a minute they aren’t spending on YouTube, Spotify, or any other entertainment platform.
* * *
Video game music is, in many respects, perfectly suited for the streaming age. From the popularity of playlists to the ascent in ambient music, streaming services’ focus on passive listening aligns with the background function of video game soundtracks. As we’ve seen, Nintendo Music takes full advantage of this, using its marketing and features to bolster branding, solidify control over IP, and encourage engagement.
For many, Nintendo Music offers an enjoyable experience and a convenient way to stream nostalgic soundtracks. But the service also exposes how proprietary platforms concentrate power and leverage passive listening for ongoing consumption, reinforcing broader patterns where work and leisure become intertwined with corporate interests. By prompting users to integrate Nintendo’s music into their activities, the platform extends the reach of its games beyond the screen and into daily life.
Whether you’re listening to famed composer Koji Kondo or everyone’s favourite troubadour dog K.K. Slider, Nintendo’s message is clear: press play and lean back.
—
Featured Image: “Mario Kart” by MIKI Yoshihito (#mikiyoshihito), CC BY 2.0
—
Ryan Blakeley is Visiting Assistant Professor at Northeastern University and holds a PhD in Musicology from the Eastman School of Music. His research investigates how digital platforms like music streaming services are shaping creative practices, listening habits, and music industry power dynamics.
—
REWIND!…If you liked this post, you may also dig:

Video Gaming and the Sonic Feedback of Surveillance: Bastion and the Stanley Parable–Aaron Trammell
Playing with the Past in the Imagined Middle Ages: Music and Soundscape in Video Games–James Cook
Beyond the Grave: The “Dies Irae” in Video Game Music–Karen Cook
Sounding Out! Podcast #29: Game Audio Notes I: Growing Sounds for Sim Cell–Leonard J. Paul
Papa Sangre and the Construction of Immersion in Audio Games— Enongo Lumumba-Kasongo
ISSN 2333-0309
Translate
Recent Posts
- The Medium is the Menace: AI and the Platforming of Hate Speech
- Mimicked Voices and Nonhuman Listening: AI Deepfakes, Speech, and Sonic Manipulation in the Digital War on Ukraine
- Hate & Non-Human Listening, an Introduction
- The Absurdity and Authoritarianism of Now: My Chemical Romance’s The Black Parade Resonates Queerly, Anew
- SO! Reads: Alexis McGee’s From Blues To Beyoncé: A Century of Black Women’s Generational Sonic Rhetorics
Archives
Categories
Search for topics. . .
Looking for a Specific Post or Author?
Click here for the SOUNDING OUT INDEX. . .all posts and podcasts since 2009, scrollable by author, date, and title. Updated every 5 minutes.



















Recent Comments