Mimicked Voices and Nonhuman Listening: Deepfakes, Speech, and Sonic Manipulation in the Digital War on Ukraine

The essays collected in this series (link to the Introduction) trace how nonhuman listening operates through sound, speech, and platformed media across distinct but interconnected domains. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. In this contribution, Olga Zaitseva-Herz examines how nonhuman listening operates under conditions of war, where AI-generated voices and deepfakes destabilize the very grounds of auditory trust. Through the case of Ukraine, she shows how platforms and political actors alike exploit algorithmic listening systems to amplify affect, circulate disinformation, and transform voice into a tool of psychological warfare. Listening, in this context, becomes not a means of understanding but a terrain of uncertainty. –Guest Editor Kathryn Huether

Russia’s full-scale invasion of Ukraine has unfolded as the most digitally mediated war to date, shaped not only by what circulates online but by how content is heard, interpreted, and amplified.  Here, listening is not limited to human hearing: it also includes algorithmic systems that detect, rank, and amplify content, as well as political actors and online publics who interpret and recirculate it. Social media platforms—Telegram, Instagram, TikTok, Facebook—have become sites of psychological warfare where AI-generated audio, video, text, and image-based content are crafted to manipulate perception and provoke rapid emotional responses, often through algorithmic systems attuned to virality and affect. Ukrainian political authorities regularly caution users by saying that everything one reads, hears, or sees could be a psychological weapon. This is not rhetorical. Content is often designed to produce outrage, shock, and despair—emotions that travel quickly across platforms and influence public mood.

AI is used to create fake news videos, synthetic voices, and deepfake conversations, complicating how authenticity is heard and assessed. Some recordings circulating on social media simulate “leaked” phone calls revealing political dissent or strategic plans that are then shared on social media sites such as Telegram, Instagram, and Facebook. At the same time, the fact that people’s original voices can now also be generated with AI means that one can claim that their recorded voice is AI-generated. A widely circulated case involved Russian music producer Iosif Prigozhin, whose alleged call criticizing the Kremlin provoked significant backlash. Soon after he claimed the recording was an AI forgery – a statement whose truth remains unclear, but which strategically exploits growing public awareness of deepfakes as a means of discrediting or distancing from damaging material. Deepfakes thus do not merely deceive; they also destabilize the conditions of listening and trust, turning listening itself into a site of strategic uncertainty.. This uncertainty exploits a growing crisis of trust in listening itself, where voices can always be disavowed as synthetic. Against this backdrop, music and voice emerge as especially powerful media for manipulation, parody, retaliation, and symbolic struggle.

Graafika. Kuulaja. / Creator: Keerend, Avo (autor) / Date: 1980 / Providing institution: Pärnu Muuseum / Aggregator: E-Varamu / Providing Country: Estonia / CC0 1.0 / Graafika. Kuulaja. by Keerend, Avo (autor) – 1980 – Pärnu Museum, Estonia – CC0.

AI Songs as a Tool of Revenge

AI generative tools are also used for irony or parody, such as in the viral remake “Samotni Moskali,” [Lonely Muscovites], which mocks the Ukrainian pop star Ani Lorak, who moved to Russia. On November 13th, 2023, Ukrainian journalist and politician Anton Gerashchenko’s Telegram channel posted a video remake of Ani Lorak’s old song “Poludneva Speka” [Midday Heat], renamed “Samotni Moskali.” This video quickly went viral on social media. Her big hit from the ’00s has been remade into strongly pro-Ukrainian content, featuring clips from current frontlines to illustrate new lyrics generated by an AI voice engineered to closely mimic Lorak’s vocal timbre and affect. The parody relies on listeners recognition of her voice and affective style, while the imitation introduces a strong contentual shift between the original and synthetic lyrics.

This social media burst was a response to Ani Lorak’s claimed political neutrality in the context of Russia´s full-scale war against Ukraine, despite clear signs from her that supported Russia. These actions seemed aimed at revenge and at the same time, the public breakup of her Ukrainian fan base, showing the impact of her choices, while her Ukrainian audience felt betrayed.  It led to many satirical memes, including AI-generated songs related to her stage persona, appearing on social media. Knowing that, under current Russian politics, she could get into trouble there if the government took the promoted `support´ for the Ukrainian army seriously. The revenge group went even further by creating a homepage called “Ani Lorak Foundation,” completely dedicated to fundraisers for the Ukrainian army, which is represented like Lorak’s own project where she showcases her support of Ukrainian battalions. Some military drones deployed by the Ukrainian side even ended up bearing stickers with the name of the “Ani Lorak Foundation.“ This case demonstrates how AI tools became instruments of public satire, sabotage and protest in the context of the current full-scale war.

AI Songs as a Weapon

During the full-scale invasion, Russia has been using AI-generated music as a weapon for propaganda and disinformation. In 2023, multiple songs in Ukrainian were created to disrupt Ukraine’s military mobilization efforts and went viral. One of these, the song “Mamo, Ia Ukhyliant” [Mother, I am a Draft Dodger], became particularly popular in a multitude of variations. Their circulation shows how platforms “listen” to wartime content through metrics of repetition, provocation, and affective intensity, amplifying messages not because they are true, but because they are likely to generate reaction and spread. These songs were algorithmically promoted on TikTok and successfully sparked a viral challenge aimed at undermining Ukraine’s mobilization in 2024 by encouraging Ukrainian men to evade the draft, flee, and party abroad instead. In return, Ukrainian intelligence has released an official statement that these songs are products of the Russian disinformation campaign.

This example shows how AI-generated songs are actively used as powerful tools of war, spreading political messages and influencing people’s political choices. Also, the fact that all these songs about draft evasion were released in Ukrainian highlights the goal of targeting Ukrainian men specifically, since Russian men usually don’t speak Ukrainian and therefore wouldn’t be affected by the content. Furthermore, the presence of a large number of these `draft dodger’ songs at the same time created the impression of widespread societal acceptance through repetition and algorithmic amplification. In this way, repetition itself became a signal of apparent legitimacy: the more frequently such content circulated, the more easily platforms and audiences could register it as evidence of broader consensus around draft evasion within Ukrainians.

Photo by Jon Tyson on Unsplash

AI Pictures on Facebook Mimicking Sound and Sonic Affect

Visual disinformation follows similar viral patterns. There has been a surge of AI-generated images with war-related content, often mimicking sound to intensify emotional impact and prompt affective listening by showing a screaming child amid the rubble or a crying soldier in a Ukrainian uniform, paired with a patriotic, pro-Ukrainian message that encourages interaction, such as a like or comment. Even without actual sound, such images solicit a kind of affective listening in which suffering is not literally heard but imagined, projected, and emotionally registered through visual cues. Meanwhile, although this truth-blurring pattern attracted significant attention among many Ukrainians, ironic counter-memes emerged, mocking its primitive approach.

According to warnings from the Ukrainian online security agency, these accounts aim to interact with pro-Ukrainian users, ultimately adding them as friends or followers. Then, when they build a large enough audience, they shift the type of content they share to pro-Russian. The strategy relies on gathering an audience that is specifically pro-Ukrainian, as they interact with images of crying soldiers or the suffering of the Ukrainian people at the front. In this sense, the filtering process functions as a form of nonhuman listening at the level of audience formation: platforms and account managers learn which publics respond to particular emotional cues, cultivate those publics through repeated engagement, and later redirect them toward different ideological content. This creates a filtering mechanism through which an initially pro-Ukrainian audience is gathered, profiled, and later ideologically redirected, alienating loyal followers while pulling political opinion in a more pro-Russian direction.

Pro-Russian AI Songs in Germany to weaken Support of Ukraine

In Germany, AI-generated songs are being utilized as propaganda tools to promote pro-Russian sentiment and anti-Ukrainian views. The right-wing party AfD has embraced AI songs as a potent tool in this regard. Multiple mostly anonymous YouTube accounts have emerged spreading right-wing ideas, with these songs not only addressing German political issues but also openly supporting Russia. For instance, one song titled “Meine Stimme Habt ihr nicht” [You don’t get my vote] features an AI-created avatar of a tall, strong woman holding German and Russian flags. The version of the same song was also released in Russian. The lyrics criticize Germany’s political course, including military aid to Ukraine, and expresses a desire to be friends with Russia.  Its circulation across German and Russian suggests that listening is being calibrated for different national and linguistic publics, allowing similar political messages to be heard through distinct affective and ideological frames shaped by language, audience, and context.

Contemporary propaganda is increasingly shaped not just by human intent but by rapidly developing nonhuman listening systems—both in production and amplification. Algorithmic listening and perception are exploited to privilege what provokes, not what is true, complicating efforts to regulate digital hate, emotion, and influence. In this context, listening becomes not only a human practice of interpretation, but also a technical system of detection, ranking, and amplification—and, crucially, a site of failure where truth, trust, and perception can no longer be reliably aligned.

Featured Image: Photo by Stanislav Vlasov on Unsplash.

Olga Zaitseva-Herz is an ethnomusicologist working at the intersection of Ukrainian music, war, displacement, and digital culture. She is currently a postdoctoral researcher at the Kule Centre for Ukrainian and Canadian Folklore at the University of Alberta and a guest scholar at Think Space Ukraine at the University of Regensburg. Her research examines how song operates as a medium of political mediation, cultural diplomacy, and historical memory, with a particular focus on popular music and AI-generated sound during Russia’s full-scale invasion of Ukraine. Combining perspectives from ethnomusicology, sound studies, and media analysis, her work investigates how music shapes narratives of resistance, belonging, and global visibility, and how sonic practices illuminate the broader entanglements of culture, technology, and power.

REWIND! . . .If you liked this post, you may also dig:

Hate & Non-Human Listening, an Introduction–Kathryn Huether

Your Voice is (Not) Your PassportMichelle Pfeifer 

Mapping the Music in Ukraine’s Resistance to the 2022 Russian InvasionMerje Laiapea

SO! Amplifies: An Interactive Map of Music as Ukrainian Resistance to the 2022 Russian InvasionMerje Laiapea





Hate & Non-Human Listening, an Introduction

In January 2026, WIRED reported that U.S. Immigration and Customs Enforcement (ICE) has begun using Palantir’s AI tools to process public tip-line submissions. The system does not simply store or relay these reports. It processes English-language submissions, condensing them into what is called a “BLUF”—a “bottom line up front” summary that allows agents to quickly assess and prioritize cases. 

Efficiency is the dominant framing as the system promises speed, clarity, and control over overwhelming volumes of information. Yet such efficiency depends on a prior reduction as expression is detached from the conditions of its articulation and reconstituted as data. In this form, listening no longer risks misunderstanding, it eliminates it. 

Nor does this infrastructure operate in isolation. It relies on distributed participation in which listening is recast as vigilance. A recent ICE public X (Twitter) post encouraged residents to report “suspicious activity,” assuring them that doing so would make their communities safer. 

The language is familiar, even reassuring. But it depends on a prior act of interpretation: that certain voices, presences, or behaviors are already legible as threat. Listening here becomes pre-classification—identifying danger in advance and acting on that identification as if it were already known. Rather than an isolated case, this development signals a broader transformation in how immigration and enforcement are governed. As legal and policy analyses increasingly note, artificial intelligence is becoming “one of the fundamental operating tools of policing,” deployed across domains ranging from speech and text analysis to risk assessment and document verification. Systems such as USCIS’s Evidence Classifier, which tags and prioritizes key documents within case files, and platforms like ImmigrationOS, which aggregate data across agencies to guide enforcement decisions, do not simply process information—they reorganize it. What matters is not only what is said, but whether it aligns—across time, across records, across bureaucratic expectations. Listening becomes continuous and anticipatory, oriented toward detecting inconsistency, deviation, and risk before any claim can be made or contested.

A very different narrative circulates alongside these developments. A recent BBC article suggested that AI chatbots can function as unusually “good listeners”—patient, nonjudgmental, even compassionate. Users describe these systems as offering space for reflection, sometimes preferring them to human interlocutors. Yet what is at work is not attention or relation, but pattern recognition trained to simulate understanding. Taken together, these examples reveal a shared transformation. Across both enforcement systems and everyday interaction, listening is increasingly detached from sensation, exposure, and accountability, becoming a process of extraction and classification rather than relation. As Dorothy Santos argues in her account of speech AI, machines do not simply assist human listening; they assume its position, becoming “the listeners to our sonic landscapes” while also acting as the capturers, surveyors, and documenters of our utterances. What follows from this shift is not just a change in who listens, but in what listening is. Listening no longer names an encounter between subjects; it describes a technical operation distributed across infrastructures that register, store, and act on sound without ever hearing it.

This shift is what I call “nonhuman listening.”

Nonhuman listening names both an infrastructural condition and a set of practices through which listening is reorganized as a technical operation. It describes a mode of perception distributed across systems that capture, process, and act on sound without exposure to it as experience, as well as the procedures—classification, ranking, prediction—through which sound is rendered actionable in advance. At stake is not simply the emergence of new technologies, but a reorganization of what listening has long been understood to do. Listening unfolds across thresholds of perception, attention, and care, shaped by what can be sensed, cultivated, or ignored. From its earliest formulations, it has been understood not as passive reception but as an ethically charged capacity. Aristotle’s distinction between akousis (hearing) and akroasis (listening) marks this divide, reserving listening for forms of attention capable of judgment and response. In this sense, listening has always named both openness and control: a posture of receptivity toward others and a way of organizing the world.

Nonhuman listening amplifies an older logic: not all voices are heard, and not all forms of speech register as meaning and listening does not begin from neutrality. Norms organize it in advance, determining what registers as signal, who gets to hear, and whose speech counts as intelligible. Meaning and noise do not inhere in sound itself; they emerge through historically sedimented expectations about voice, difference, and belonging.

Sound studies has long challenged the assumption that listening inherently connects or humanizes. Listening does not operate as an immediate or intimate relation; it relies on frameworks that precondition perception. Jonathan Sterne shows that claims about sonic immediacy function less as empirical truths than as ideological formations—narratives that naturalize particular social arrangements while obscuring how listening renders some forms of speech legible and others unintelligible. Listening does not simply receive the world—it organizes it.

At the same time, theoretical and experimental approaches foreground the instability of this organization. Voices do not exist as stable entities prior to their mediation; they “show up as real,” as Matt Rahaim writes, through specific practices and infrastructures that render them intelligible, contested, or indeterminate. Jean-Luc Nancy conceptualizes listening as resonance, emphasizing exposure—the possibility that listening might unsettle the subject—while also underscoring that such openness never distributes evenly. John Cage and Pauline Oliveros treat listening as a disciplined practice that requires cultivation and can fail as easily as it attunes. Listening is not given; it is trained.

“Training Machine Listening” CC BY-NC 4.0

Across these accounts, listening operates within regimes of power. Jacques Attali locates listening within governance, where institutions determine what can be heard, what must be silenced, and what becomes disposable. Trauma and memory studies intensify these stakes. Henry Greenspan shows that listening to testimony never occurs as a singular or sufficient act, and that extractive modes of attention can reproduce violence rather than alleviate it. Ralina L. Joseph’s concept of radical listening reframes listening as an ethical orientation—one that demands accountability to power, difference, and fatigue, and that attends to how speakers wish to be heard. As she writes, “the easiest way to refuse to listen is to keep talking.”

Taken together, these accounts point to a more difficult claim: listening is not simply uneven—it is directional. It can orient toward exposure and relation, or toward certainty and verification. When listening turns toward certainty, it no longer encounters speech as an address. It apprehends it in advance while certain voices register not as claims or appeals, but as warnings or threats.

Such orientation has precedents that are neither abstract nor metaphorical. During the 1937 Parsley Massacre, Dominican soldiers used pronunciation as a test of belonging. Suspected Haitians were asked to say the word perejil (parsley); those whose speech did not conform to expected phonetic norms were identified as foreign and often killed. Listening here did not register meaning or intent. It functioned as classification—reducing speech to a signal of difference and acting on that difference as if it were already known.

This logic persists in contemporary enforcement practices, albeit in different registers. Recent encounters with U.S. immigration agents reveal how accent continues to operate as a proxy for suspicion and a trigger for intervention. In multiple reported incidents, individuals have been stopped or detained and asked to account for their citizenship on the basis of how they sound: “Because of your accent,” one agent stated when asked to justify the demand for documentation . In another case, an agent explicitly linked auditory difference to disbelief, telling a driver, “I can hear you don’t have the same accent as me,” before repeatedly questioning where he was born.

In these moments, listening again operates as pre-classification. Accent is not heard as variation, history, or movement, but as evidence—an audible marker of non-belonging that precedes and justifies further scrutiny. What is at stake is not mishearing, but a mode of listening trained to stabilize difference as risk. Speech becomes legible only insofar as it confirms or disrupts an already established expectation of who belongs.

Early analyses of digital surveillance anticipated a more radical transformation than they could yet fully name. Writing in 2014, Robin James identified an emerging “acousmatic” condition in which listening detaches from any identifiable listener and disperses across systems of data capture and analysis. The 2013 Snowden disclosures make clear that this shift was not theoretical but already operational. State surveillance had moved from targeted interception to total capture, amassing communications indiscriminately and deriving “suspicion” only after the fact, as a pattern extracted from within the dataset itself. Listening no longer responds to a known object; it produces the object it claims to detect. What registers as “suspicious” does not precede analysis but materializes through algorithmic filtering, where signal and noise become effects of the system’s design rather than properties of the world. Under these conditions, listening ceases to function as a sensory or interpretive act and instead operates as an infrastructural logic of sorting, ranking, and preemption. Contemporary platforms extend and normalize this logic. They do not hear sound; they process it, rendering it actionable without ever encountering it as experience.

“Social Media Listening” CC BY-NC 4.0

The essays collected in this series extend this transformation across distinct but interconnected domains, tracing how nonhuman listening operates through sound, speech, and platformed media. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. Next week, Olga Zaitseva-Herz situates these dynamics within the context of digital warfare, where AI-generated voices, deepfakes, and synthetic media circulate as instruments of psychological manipulation, designed to provoke affective responses that travel faster than verification.

Contemporary speech technologies make this continuity visible at the level of language itself. As work in the Racial Bias in Speech AI series shows, particularly as Michelle Pfeifer demonstrates, speech technologies do not simply fail to recognize certain speakers; they formalize assumptions about what counts as intelligible language in the first place. In these systems, the voice is not encountered as expression but as input—something to be parsed, categorized, and aligned with existing datasets. When AI systems encounter African American Vernacular English—especially emergent idioms shaped by Black and queer communities—language is flattened into surface definitions, stripped of cultural grounding, or flagged as inappropriate. Speech is not heard as situated expressions; it is processed as deviation from an unmarked norm.

What emerges is a form of hostile listening: not the misrecognition of a human listener, but a condition in which recognition is structurally focused. Racialized language becomes perpetually at risk–mistrusted or excluded–not because it fails to communicate but because it exceeds the parameters through which the system can register meaning. Hate here is not expressive or intentional; it is procedural, embedded in the standards that determine what can be heard as language at all. 

In this sense, the problem is not that listening has been replaced. It is that it continues—without exposure, without relation, without consequence for those who perform it. What appears as neutrality is the absence of risk. What appears as efficiency is the removal of encounters. Under these conditions, harm does not need to be spoken. It is heard into being in advance—stabilized as signal, confirmed as threat, and acted upon before it can be contested. The question that remains is not whether machines can learn to listen better. It is whether we can still recognize listening once it no longer requires us at all.

Kathryn Agnes Huether is a Postdoctoral Research Associate in Antisemitism Studies at UCLA’s Initiative to Study Hate and the Alan D. Leve Center for Jewish Studies. She earned her PhD in musicology with a minor in cultural studies from the University of Minnesota (2021) and holds a second master’s in religious studies from the University of Colorado Boulder. She has held visiting appointments at Bowdoin College and Vanderbilt University and was the  2021–2022 Mandel Center Postdoctoral Fellow at the United States Holocaust Memorial Museum.

Her research examines how sound mediates Holocaust memory, antisemitism, racial violence, and contemporary politics. She has published in Sound Studies and Yuval,  has forthcoming work in the Journal of the Society for American Music and Music and Politics. She is a member of the Holocaust Educational Foundation of Northwestern University’s (HEFNU) Virtual Speakers Bureau and has been an invited educator at two of its regional institutes, and is current editor of ISH’s public-facing blog. Her first book, Sounding Hate: Sonic Politics in the Age of Platforms and AI, is in progress. Her second, Sounding the Holocaust in Film, is a forthcoming teaching compendium that brings together key concepts in Holocaust studies with methods from film music and sound studies.

Series Icon designed by Alex Calovi

REWIND! . . .If you liked this post, you may also dig:

Your Voice is (Not) Your PassportMichelle Pfeifer 

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari 

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso