Tag Archive | John Cage

by Kathryn Huether
in Article, artificial intelligence, Capitalism, Cultural Studies, Digital Humanities, Digital Media, Hate & Non-Human Listening Series, Humanism, Identity, immigration and migration, Information, Internets, Language, Listening, Politics, Public Debate, Race, Rhetoric, social media, sonification, Sound, Sound Studies, Speech, Voice
Leave a comment

Hate & Non-Human Listening, an Introduction

In January 2026, WIRED reported that U.S. Immigration and Customs Enforcement (ICE) has begun using Palantir’s AI tools to process public tip-line submissions. The system does not simply store or relay these reports. It processes English-language submissions, condensing them into what is called a “BLUF”—a “bottom line up front” summary that allows agents to quickly assess and prioritize cases.

Efficiency is the dominant framing as the system promises speed, clarity, and control over overwhelming volumes of information. Yet such efficiency depends on a prior reduction as expression is detached from the conditions of its articulation and reconstituted as data. In this form, listening no longer risks misunderstanding, it eliminates it.

Nor does this infrastructure operate in isolation. It relies on distributed participation in which listening is recast as vigilance. A recent ICE public X (Twitter) post encouraged residents to report “suspicious activity,” assuring them that doing so would make their communities safer.

The language is familiar, even reassuring. But it depends on a prior act of interpretation: that certain voices, presences, or behaviors are already legible as threat. Listening here becomes pre-classification—identifying danger in advance and acting on that identification as if it were already known. Rather than an isolated case, this development signals a broader transformation in how immigration and enforcement are governed. As legal and policy analyses increasingly note, artificial intelligence is becoming “one of the fundamental operating tools of policing,” deployed across domains ranging from speech and text analysis to risk assessment and document verification. Systems such as USCIS’s Evidence Classifier, which tags and prioritizes key documents within case files, and platforms like ImmigrationOS, which aggregate data across agencies to guide enforcement decisions, do not simply process information—they reorganize it. What matters is not only what is said, but whether it aligns—across time, across records, across bureaucratic expectations. Listening becomes continuous and anticipatory, oriented toward detecting inconsistency, deviation, and risk before any claim can be made or contested.

“Automated Species Recognition” CC BY 4.0

A very different narrative circulates alongside these developments. A recent BBC article suggested that AI chatbots can function as unusually “good listeners”—patient, nonjudgmental, even compassionate. Users describe these systems as offering space for reflection, sometimes preferring them to human interlocutors. Yet what is at work is not attention or relation, but pattern recognition trained to simulate understanding. Taken together, these examples reveal a shared transformation. Across both enforcement systems and everyday interaction, listening is increasingly detached from sensation, exposure, and accountability, becoming a process of extraction and classification rather than relation. As Dorothy Santos argues in her account of speech AI, machines do not simply assist human listening; they assume its position, becoming “the listeners to our sonic landscapes” while also acting as the capturers, surveyors, and documenters of our utterances. What follows from this shift is not just a change in who listens, but in what listening is. Listening no longer names an encounter between subjects; it describes a technical operation distributed across infrastructures that register, store, and act on sound without ever hearing it.

This shift is what I call “nonhuman listening.”

Nonhuman listening names both an infrastructural condition and a set of practices through which listening is reorganized as a technical operation. It describes a mode of perception distributed across systems that capture, process, and act on sound without exposure to it as experience, as well as the procedures—classification, ranking, prediction—through which sound is rendered actionable in advance. At stake is not simply the emergence of new technologies, but a reorganization of what listening has long been understood to do. Listening unfolds across thresholds of perception, attention, and care, shaped by what can be sensed, cultivated, or ignored. From its earliest formulations, it has been understood not as passive reception but as an ethically charged capacity. Aristotle’s distinction between akousis (hearing) and akroasis (listening) marks this divide, reserving listening for forms of attention capable of judgment and response. In this sense, listening has always named both openness and control: a posture of receptivity toward others and a way of organizing the world.

Nonhuman listening amplifies an older logic: not all voices are heard, and not all forms of speech register as meaning and listening does not begin from neutrality. Norms organize it in advance, determining what registers as signal, who gets to hear, and whose speech counts as intelligible. Meaning and noise do not inhere in sound itself; they emerge through historically sedimented expectations about voice, difference, and belonging.

Sound studies has long challenged the assumption that listening inherently connects or humanizes. Listening does not operate as an immediate or intimate relation; it relies on frameworks that precondition perception. Jonathan Sterne shows that claims about sonic immediacy function less as empirical truths than as ideological formations—narratives that naturalize particular social arrangements while obscuring how listening renders some forms of speech legible and others unintelligible. Listening does not simply receive the world—it organizes it.

At the same time, theoretical and experimental approaches foreground the instability of this organization. Voices do not exist as stable entities prior to their mediation; they “show up as real,” as Matt Rahaim writes, through specific practices and infrastructures that render them intelligible, contested, or indeterminate. Jean-Luc Nancy conceptualizes listening as resonance, emphasizing exposure—the possibility that listening might unsettle the subject—while also underscoring that such openness never distributes evenly. John Cage and Pauline Oliveros treat listening as a disciplined practice that requires cultivation and can fail as easily as it attunes. Listening is not given; it is trained.

“Training Machine Listening” CC BY-NC 4.0

Across these accounts, listening operates within regimes of power. Jacques Attali locates listening within governance, where institutions determine what can be heard, what must be silenced, and what becomes disposable. Trauma and memory studies intensify these stakes. Henry Greenspan shows that listening to testimony never occurs as a singular or sufficient act, and that extractive modes of attention can reproduce violence rather than alleviate it. Ralina L. Joseph’s concept of radical listening reframes listening as an ethical orientation—one that demands accountability to power, difference, and fatigue, and that attends to how speakers wish to be heard. As she writes, “the easiest way to refuse to listen is to keep talking.”

Taken together, these accounts point to a more difficult claim: listening is not simply uneven—it is directional. It can orient toward exposure and relation, or toward certainty and verification. When listening turns toward certainty, it no longer encounters speech as an address. It apprehends it in advance while certain voices register not as claims or appeals, but as warnings or threats.

Such orientation has precedents that are neither abstract nor metaphorical. During the 1937 Parsley Massacre, Dominican soldiers used pronunciation as a test of belonging. Suspected Haitians were asked to say the word perejil (parsley); those whose speech did not conform to expected phonetic norms were identified as foreign and often killed. Listening here did not register meaning or intent. It functioned as classification—reducing speech to a signal of difference and acting on that difference as if it were already known.

This logic persists in contemporary enforcement practices, albeit in different registers. Recent encounters with U.S. immigration agents reveal how accent continues to operate as a proxy for suspicion and a trigger for intervention. In multiple reported incidents, individuals have been stopped or detained and asked to account for their citizenship on the basis of how they sound: “Because of your accent,” one agent stated when asked to justify the demand for documentation . In another case, an agent explicitly linked auditory difference to disbelief, telling a driver, “I can hear you don’t have the same accent as me,” before repeatedly questioning where he was born.

In these moments, listening again operates as pre-classification. Accent is not heard as variation, history, or movement, but as evidence—an audible marker of non-belonging that precedes and justifies further scrutiny. What is at stake is not mishearing, but a mode of listening trained to stabilize difference as risk. Speech becomes legible only insofar as it confirms or disrupts an already established expectation of who belongs.

Early analyses of digital surveillance anticipated a more radical transformation than they could yet fully name. Writing in 2014, Robin James identified an emerging “acousmatic” condition in which listening detaches from any identifiable listener and disperses across systems of data capture and analysis. The 2013 Snowden disclosures make clear that this shift was not theoretical but already operational. State surveillance had moved from targeted interception to total capture, amassing communications indiscriminately and deriving “suspicion” only after the fact, as a pattern extracted from within the dataset itself. Listening no longer responds to a known object; it produces the object it claims to detect. What registers as “suspicious” does not precede analysis but materializes through algorithmic filtering, where signal and noise become effects of the system’s design rather than properties of the world. Under these conditions, listening ceases to function as a sensory or interpretive act and instead operates as an infrastructural logic of sorting, ranking, and preemption. Contemporary platforms extend and normalize this logic. They do not hear sound; they process it, rendering it actionable without ever encountering it as experience.

The essays collected in this series extend this transformation across distinct but interconnected domains, tracing how nonhuman listening operates through sound, speech, and platformed media. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. Next week, Olga Zaitseva-Herz situates these dynamics within the context of digital warfare, where AI-generated voices, deepfakes, and synthetic media circulate as instruments of psychological manipulation, designed to provoke affective responses that travel faster than verification.

Contemporary speech technologies make this continuity visible at the level of language itself. As work in the Racial Bias in Speech AI series shows, particularly as Michelle Pfeifer demonstrates, speech technologies do not simply fail to recognize certain speakers; they formalize assumptions about what counts as intelligible language in the first place. In these systems, the voice is not encountered as expression but as input—something to be parsed, categorized, and aligned with existing datasets. When AI systems encounter African American Vernacular English—especially emergent idioms shaped by Black and queer communities—language is flattened into surface definitions, stripped of cultural grounding, or flagged as inappropriate. Speech is not heard as situated expressions; it is processed as deviation from an unmarked norm.

What emerges is a form of hostile listening: not the misrecognition of a human listener, but a condition in which recognition is structurally focused. Racialized language becomes perpetually at risk–mistrusted or excluded–not because it fails to communicate but because it exceeds the parameters through which the system can register meaning. Hate here is not expressive or intentional; it is procedural, embedded in the standards that determine what can be heard as language at all.

In this sense, the problem is not that listening has been replaced. It is that it continues—without exposure, without relation, without consequence for those who perform it. What appears as neutrality is the absence of risk. What appears as efficiency is the removal of encounters. Under these conditions, harm does not need to be spoken. It is heard into being in advance—stabilized as signal, confirmed as threat, and acted upon before it can be contested. The question that remains is not whether machines can learn to listen better. It is whether we can still recognize listening once it no longer requires us at all.

—

Kathryn Agnes Huether is a Postdoctoral Research Associate in Antisemitism Studies at UCLA’s Initiative to Study Hate and the Alan D. Leve Center for Jewish Studies. She earned her PhD in musicology with a minor in cultural studies from the University of Minnesota (2021) and holds a second master’s in religious studies from the University of Colorado Boulder. She has held visiting appointments at Bowdoin College and Vanderbilt University and was the 2021–2022 Mandel Center Postdoctoral Fellow at the United States Holocaust Memorial Museum.

Her research examines how sound mediates Holocaust memory, antisemitism, racial violence, and contemporary politics. She has published in Sound Studies and Yuval, has forthcoming work in the Journal of the Society for American Music and Music and Politics. She is a member of the Holocaust Educational Foundation of Northwestern University’s (HEFNU) Virtual Speakers Bureau and has been an invited educator at two of its regional institutes, and is current editor of ISH’s public-facing blog. Her first book, Sounding Hate: Sonic Politics in the Age of Platforms and AI, is in progress. Her second, Sounding the Holocaust in Film, is a forthcoming teaching compendium that brings together key concepts in Holocaust studies with methods from film music and sound studies.

—

Series Icon designed by Alex Calovi

—

REWIND! . . .If you liked this post, you may also dig:

Your Voice is (Not) Your Passport—Michelle Pfeifer

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso

August 10, 2015

by Maile Colbert
in Aesthetics, Digital Media, Film/Movies/Cinema, Interview, Listening, Music, Performance, Sound, Sound Studies, Technology, Visual Art
3 Comments

Aural Guidings: The Scores of Ana Carvalho and Live Video’s Relation to Sound

If you were to choose to watch live video composer and performer Ana Carvalho’s work silent, your brain would be easily guided into a synesthetic experience, assigning sounds to each rhythmic change in color, pace, frame. Her images oscillate…they dance, they breathe. As you experience this, there might be a sense that you have lost your ability to hear the outside world, as these images are clearly attached to, woven with, a part of sound.

There is a history of composers such as Iannis Xenaxis and Cornelius Cardew using graphic scores and notation in place of traditional methods and symbols, as a way to reach a deeper expression through allowing greater interpretation, chance, and improvisation with their musicians. They concentrate more on conveying information on how a work is played, rather then what notes to play when. Carvalho uses the graphic score much in the same way, but also as a method of communication between live audio and live video performances, instructing a dialog between two disciplines that are often side-by-side or leaning on each other, but rarely woven together in the manner I have experienced both as audience, and as an audio composer, with her work.

The following interview has been edited for style.

Maile Colbert: Hi Ana, how are you today? And what are you working on currently?

Ana Carvalho: I am good. I’m working on a performance to present at the solstice, with Neil Leonard, and a text about the possibility of expansion of the mind through performing and fruition of being in an audiovisual performance.

At the moment the performance is still involved into misty possibilities of what we know of each other’s work, and what we have been developing individually and talking about. There will be saxophone, electronics, and visuals made of strange landscapes.

MC: At this stage in the process, when working with a sound maker such as Leonard, how do you think about the images’ relationship with the sound?

AC: Images come to be as they appear on the screen in two ways: first there is the introspection about what have I learned from the previous performances, what I want to explore further and what I don’t want to repeat. Secondly, there is the encounter with the other person and his or her work and how do I translate their sound into moving image. At some point my ideas change through the exchange and becomes something else, a visual performance that could only be presented with that sound, with that person, at that place and time.

Regarding the sound in particular, sometimes I propose a structure, or a score, to be followed by sound and image. Other times it is improvised. As I enjoy very much the process, I tend to like to make structures for the performances, which develop along drawings and texts.

MC: Your scores of text and image are quite beautiful, and of course I am personally lucky to both have some of the published ones, and have had a chance to work with them as well.

As a sound maker, I find they have a flow and almost narrative that feels both intentional and intuitive, no matter how abstract. When you make these, how much of a clear idea of how the audio would sound in relation to the images is in your mind? Do your images always or often follow the same score? Do the results surprise you?

AC: My state of arriving at the point of starting to make a composition is very much the same I described about working with a sound artist towards the presentation of a performance, that is, what do I want to develop further and what I want to leave behind? Then, what is particular to this new situation/performance/collaboration? The composition is a sort of a vehicle that connects process and performance, and that connects sound and image.

The composition is for the two mediums, sound and image, and they are considered in terms of composition as a unity made of two parts. They have to work in a conversational way. Imagine two close friends and how they would be talking to each other. I have in my imagination how it would be ideally two people in that position. That is how sound and image relate on the composition. I think in the most generic terms, more of intensity and flow rather than sonic results. For this reason any musician, with any possible (or invented) instrument, with whichever image or sound database, is able to play a score. It’s required though that the performers take time to reflect, that the composition is understood and incorporated in the way each one plays their instrument. The results are always a surprise.

MC: What inspired you to start working in this way with the scores, and when?

AC: My interest in making compositional scores, and from them documents related to my performance, has been inspired by conceptual and process based art and by my research on documentation of the ephemeral. The focus on the process highlights the need for other representations that are not the finished art object per se. These other ways of representation use available media to describe the making and the reflection while making. Mary Kelly’s Post-Partum Document (1973-79) is a very interesting example of what I am describing. She is expressing her feelings, the growing process of a child and an external perspective through visual objects displayed both as an exhibition and in book formats. Within audiovisual practice, I have been researching for the past five years on creative ways to make documents of the process. My attention was directed to composition in music. The influence of the composer Cornelius Cardew has been great, especially his work Treatise and his idea of directed improvisation. John Cage was also very important for his structures (in talks, texts and music), the use of the I Ching, and for bringing chance into composition.

Simultaneously, while studying and reflecting on these and other subjects, I realized that intuitively I make drawings, texts, and take photographs as a way to detach from the everyday and immerge into a creative process which eventually will lead to the concept and content of a performance.

Systematic Illusion – The Subtle Technique in an Earthquake Detector Construction has been so far the most complex project to include a score, as well as a series of photographs and the performance. It was presented in its complete form just once in the curatorial project Decalcomania. Organizing these elements as composition, creating a score, and from all this to make little books has been a way of putting into practice my research interests.

A result from the construction of the score has been that the process doesn’t stop with one performance, as I then use the same scores to perform with different artists. For example with the score from this project I created the Earthquake Detector performance series within which we presented the performance together in São Paulo in 2013 at the event Arranjos Experimentais.

MC: As you speak about your work, I keep thinking about Robert Bresson’s Notes on Sound and how most of his notes refer to variations of not letting sound or image take over each other, but to weave them together within the composition. In his Notes on the Cinematographer, he also wrote number “10. not to use two violins when one is enough”. What might this mean to you in relation to your collaborations?

AC: One very inspiring event in the attempts I make to construct a formal grammar and way for me to address collaborations has been to see the film Passage Through: A Ritual by Stan Brakhage. The film was screened at Serralves Museum, here in Porto, Portugal, in June 2011. The event addressed the collaboration and improvisation of music in relationship with cinema from the work of the composers Malcolm Goldstein and Philip Corner and their music in the film work of Daïchi Saïto and Stan Brakhage.

The most amazing thing in Passage Through: A Ritual was to watch such a beautiful film made of an abundance of black screen, that is, an absence of visual form, of light and movement. Each appearance of image was a precious moment. What I learned is that the visual emptiness, and sonic as well, contains information when stimulated through glimpses of image, making the experience of seeing and listening very deep. (In his Notes), Bresson sums up this appropriateness of means and complex connections in a simple and clear sentence.

On the scores I make most of the information can be used for both sound and image. On the Refractive Composition, the scale of greys is for image. That was its purpose when the score was performed the first time. But if someone decides to play from the score without knowing anything about it beforehand, and lacking this intention (which was relevant when it was made, but afterwards there was the decision to not leave it as a declared instruction), that person can also interpret the same information as sound.

MC: Alchemy has been described as: “…the chemistry of the subtlest kind which allows one to observe extraordinary chemical operations at a more rapid pace; ones that require a long time for nature to produce” (Paul-Jacques Malouin, Alchimie). Looking from a history of cinema, there is a tradition and pattern of picture coming before sound, a hierarchy that is both in process and production. You often feel this as an audience. Your work and collaboration have a quality of sound and image having been born together. Having worked with you using your scores, I was likening it to being given a recipe where you have before you ingredients and suggestions, but there is room for your own improvisations and reading. Perhaps that is where that feeling of both disciplines coming together in a manner that feels like they are one part of a whole, rather then separate but leaning on each other, comes from.

Is there an alchemical element to this work, or are you seeking one out? And in that regard, are your scores like a recipe?

AC: What I am seeking with my work can relate to alchemy as experiments and attempts in the quest for depth in all things at the point where differences and frontiers become undefined and irrelevant (in communication between beings, in areas of knowledge, between matter and energy). This quest for depth is based on stubborn curiosity towards evolution as a person, and as part of the world. Perhaps there isn’t an alchemical element to the work, but rather a connection with alchemy, in the ways scores relate to the experiments as recipes to be shared with others in construction and change. This takes me to another aspect of composition. It is difficult for me to understand live image as just accompaniment to a music performance, and vice versa. This is perhaps central to my composition, and the reason why I am doing it for sound and image, to be able to perform that intertwined.

If we look at compositions as recipes, it aim will be to set the performers in tune with each other in the construction of a performance, to set sound and image in dialogue, and to permit a multisensory experience. I have been trying to get other artists interested in performing my scores. To perform from another artist’s score may be very common in music but is unheard of in live visuals. I have as an objective to make a change in that, but for now I perform the scores with sound artists. With the Earthquake Detector series I asked sound artists to read from the score and perform with me. So far, I have presented this as performance with Jeremy Slater, Ben Owen, and with you. Because each artist has a very different approach to the score and reads it in its own way, the processes have been very different. For example, Ben Owen made visual reinterpretations of the structure, and you experimented with and without voice (reading of the text in the score), the results are therefore equally different.

MC: Aside from your scores, you could speak about your relationship to the sound you work with, and sound makers you work with, in the live moment of performance?

AC: Looking for a sound artist to work together on a specific piece or to interpret with me a composition comes from a need to transcend my individual perception of the world and to perceive it with others. It is, again, the curiosity to know the world. The only time I made a complete sound piece on my own was for the performance Vista II – Montanha presented last May (as part of Semana Andrómeda, in Maus Hábitos, Porto). Aside from this really interesting experience, each collaboration is different, every process is different, because each musician is a different person with different skills and sees the relationship between sound and image in different ways.

I am very thankful for everything I have learned with each collaboration and all the intensity with each performance but, as always, it’s the curiosity and the need to explore new frontiers that makes me move from previous project to next project. It is also the possibilities that intuitively present themselves as challenges and the “what if…” in variations.

—

Ana Carvalho is a live video composer and performer, and writes on subjects related to live audiovisual performance. She is a doctor of Communication and Digital Platforms from FLUP (Faculdade de Letras da Universidade do Porto). Her thesis is “Materiality and the Ephemeral: Identity and Performative Audiovisual Arts, its Documentation and Memory Construction.” Currently, she holds a position as invited lecturer at the ISMAI (Instituto Universitário da Maia). For more on her work, visit: http://cargocollective.com/visual-agency/About.

—

Maile Colbert is a multi-media artist with a concentration on sound and video who relocated from Los Angeles, US to Lisbon, Portugal. She is a regular writer for Sounding Out!

—

All images courtesy of the author.

—

REWIND! . . .If you liked this post, you may also dig:

Playing with Bits, Pieces, and Lightning Bolts: An Interview with Sound Artist Andrea Parkins — Maile Colbert

Sound as Art as Anti-environment — Steven Hammer

Live Electronic Performance: Theory and Practice — Primus Luta

« Older Posts

	Finding Home on the… on Unapologetic Paisa Chingona-ne…
	Hate & Non-Human… on Your Voice is (Not) Your …
	Hate & Non-Human… on Acousmatic Surveillance and Bi…
	Hate & Non-Human… on The Cyborg’s Prosody, or…
	Lana, Alexa, & C… on “Hey Google, Talk Like Issa”:…

Sounding Out!

Aural Guidings: The Scores of Ana Carvalho and Live Video’s Relation to Sound

ISSN 2333-0309

Translate

Recent Posts

Archives

Categories

Search for topics. . .

Looking for a Specific Post or Author?

Follow Us!

Like us on Facebook!

Current Top Posts

Ongoing Series

Latest 5 Forums

Top 5 Forums

Recent Comments

RSS Feeds

Sounding Out!

Hate & Non-Human Listening, an Introduction

Share this:

Aural Guidings: The Scores of Ana Carvalho and Live Video’s Relation to Sound

Share this:

ISSN 2333-0309

Translate

Recent Posts

Archives

Categories

Search for topics. . .

Looking for a Specific Post or Author?

Follow Us!

Like us on Facebook!

Current Top Posts

Ongoing Series

Latest 5 Forums

Top 5 Forums

Recent Comments

RSS Feeds