Mimicked Voices and Nonhuman Listening: AI Deepfakes, Speech, and Sonic Manipulation in the Digital War on Ukraine


The essays collected in this series (link to the Introduction) trace how nonhuman listening operates through sound, speech, and platformed media across distinct but interconnected domains. Across these accounts, listening no longer secures meaning or relation; it becomes a site of contestation, where sound is mobilized, processed, and weaponized within systems that privilege circulation, recognition, and response over truth. In this contribution, Olga Zaitseva-Herz examines how nonhuman listening operates under conditions of war, where AI-generated voices and deepfakes destabilize the very grounds of auditory trust. Through the case of Ukraine, she shows how platforms and political actors alike exploit algorithmic listening systems to amplify affect, circulate disinformation, and transform voice into a tool of psychological warfare. Listening, in this context, becomes not a means of understanding but a terrain of uncertainty. –Guest Editor Kathryn Huether
—
Russia’s full-scale invasion of Ukraine has unfolded as the most digitally mediated war to date, shaped not only by what circulates online but by how content is heard, interpreted, and amplified. Here, listening is not limited to human hearing: it also includes algorithmic systems that detect, rank, and amplify content, as well as political actors and online publics who interpret and recirculate it. Social media platforms—Telegram, Instagram, TikTok, Facebook—have become sites of psychological warfare where AI-generated audio, video, text, and image-based content are crafted to manipulate perception and provoke rapid emotional responses, often through algorithmic systems attuned to virality and affect. Ukrainian political authorities regularly caution users by saying that everything one reads, hears, or sees could be a psychological weapon. This is not rhetorical. Content is often designed to produce outrage, shock, and despair—emotions that travel quickly across platforms and influence public mood.
AI is used to create fake news videos, synthetic voices, and deepfake conversations, complicating how authenticity is heard and assessed. Some recordings circulating on social media simulate “leaked” phone calls revealing political dissent or strategic plans that are then shared on social media sites such as Telegram, Instagram, and Facebook. At the same time, the fact that people’s original voices can now also be generated with AI means that one can claim that their recorded voice is AI-generated. A widely circulated case involved Russian music producer Iosif Prigozhin, whose alleged call criticizing the Kremlin provoked significant backlash. Soon after he claimed the recording was an AI forgery – a statement whose truth remains unclear, but which strategically exploits growing public awareness of deepfakes as a means of discrediting or distancing from damaging material. Deepfakes thus do not merely deceive; they also destabilize the conditions of listening and trust, turning listening itself into a site of strategic uncertainty.. This uncertainty exploits a growing crisis of trust in listening itself, where voices can always be disavowed as synthetic. Against this backdrop, music and voice emerge as especially powerful media for manipulation, parody, retaliation, and symbolic struggle.

AI Songs as a Tool of Revenge
AI generative tools are also used for irony or parody, such as in the viral remake “Samotni Moskali,” [Lonely Muscovites], which mocks the Ukrainian pop star Ani Lorak, who moved to Russia. On November 13th, 2023, Ukrainian journalist and politician Anton Gerashchenko’s Telegram channel posted a video remake of Ani Lorak’s old song “Poludneva Speka” [Midday Heat], renamed “Samotni Moskali.” This video quickly went viral on social media. Her big hit from the ’00s has been remade into strongly pro-Ukrainian content, featuring clips from current frontlines to illustrate new lyrics generated by an AI voice engineered to closely mimic Lorak’s vocal timbre and affect. The parody relies on listeners recognition of her voice and affective style, while the imitation introduces a strong contentual shift between the original and synthetic lyrics.
This social media burst was a response to Ani Lorak’s claimed political neutrality in the context of Russia´s full-scale war against Ukraine, despite clear signs from her that supported Russia. These actions seemed aimed at revenge and at the same time, the public breakup of her Ukrainian fan base, showing the impact of her choices, while her Ukrainian audience felt betrayed. It led to many satirical memes, including AI-generated songs related to her stage persona, appearing on social media. Knowing that, under current Russian politics, she could get into trouble there if the government took the promoted `support´ for the Ukrainian army seriously. The revenge group went even further by creating a homepage called “Ani Lorak Foundation,” completely dedicated to fundraisers for the Ukrainian army, which is represented like Lorak’s own project where she showcases her support of Ukrainian battalions. Some military drones deployed by the Ukrainian side even ended up bearing stickers with the name of the “Ani Lorak Foundation.“ This case demonstrates how AI tools became instruments of public satire, sabotage and protest in the context of the current full-scale war.
AI Songs as a Weapon
During the full-scale invasion, Russia has been using AI-generated music as a weapon for propaganda and disinformation. In 2023, multiple songs in Ukrainian were created to disrupt Ukraine’s military mobilization efforts and went viral. One of these, the song “Mamo, Ia Ukhyliant” [Mother, I am a Draft Dodger], became particularly popular in a multitude of variations. Their circulation shows how platforms “listen” to wartime content through metrics of repetition, provocation, and affective intensity, amplifying messages not because they are true, but because they are likely to generate reaction and spread. These songs were algorithmically promoted on TikTok and successfully sparked a viral challenge aimed at undermining Ukraine’s mobilization in 2024 by encouraging Ukrainian men to evade the draft, flee, and party abroad instead. In return, Ukrainian intelligence has released an official statement that these songs are products of the Russian disinformation campaign.
This example shows how AI-generated songs are actively used as powerful tools of war, spreading political messages and influencing people’s political choices. Also, the fact that all these songs about draft evasion were released in Ukrainian highlights the goal of targeting Ukrainian men specifically, since Russian men usually don’t speak Ukrainian and therefore wouldn’t be affected by the content. Furthermore, the presence of a large number of these `draft dodger’ songs at the same time created the impression of widespread societal acceptance through repetition and algorithmic amplification. In this way, repetition itself became a signal of apparent legitimacy: the more frequently such content circulated, the more easily platforms and audiences could register it as evidence of broader consensus around draft evasion within Ukrainians.

AI Pictures on Facebook Mimicking Sound and Sonic Affect
Visual disinformation follows similar viral patterns. There has been a surge of AI-generated images with war-related content, often mimicking sound to intensify emotional impact and prompt affective listening by showing a screaming child amid the rubble or a crying soldier in a Ukrainian uniform, paired with a patriotic, pro-Ukrainian message that encourages interaction, such as a like or comment. Even without actual sound, such images solicit a kind of affective listening in which suffering is not literally heard but imagined, projected, and emotionally registered through visual cues. Meanwhile, although this truth-blurring pattern attracted significant attention among many Ukrainians, ironic counter-memes emerged, mocking its primitive approach.
According to warnings from the Ukrainian online security agency, these accounts aim to interact with pro-Ukrainian users, ultimately adding them as friends or followers. Then, when they build a large enough audience, they shift the type of content they share to pro-Russian. The strategy relies on gathering an audience that is specifically pro-Ukrainian, as they interact with images of crying soldiers or the suffering of the Ukrainian people at the front. In this sense, the filtering process functions as a form of nonhuman listening at the level of audience formation: platforms and account managers learn which publics respond to particular emotional cues, cultivate those publics through repeated engagement, and later redirect them toward different ideological content. This creates a filtering mechanism through which an initially pro-Ukrainian audience is gathered, profiled, and later ideologically redirected, alienating loyal followers while pulling political opinion in a more pro-Russian direction.
Pro-Russian AI Songs in Germany to weaken Support of Ukraine
In Germany, AI-generated songs are being utilized as propaganda tools to promote pro-Russian sentiment and anti-Ukrainian views. The right-wing party AfD has embraced AI songs as a potent tool in this regard. Multiple mostly anonymous YouTube accounts have emerged spreading right-wing ideas, with these songs not only addressing German political issues but also openly supporting Russia. For instance, one song titled “Meine Stimme Habt ihr nicht” [You don’t get my vote] features an AI-created avatar of a tall, strong woman holding German and Russian flags. The version of the same song was also released in Russian. The lyrics criticize Germany’s political course, including military aid to Ukraine, and expresses a desire to be friends with Russia. Its circulation across German and Russian suggests that listening is being calibrated for different national and linguistic publics, allowing similar political messages to be heard through distinct affective and ideological frames shaped by language, audience, and context.
Contemporary propaganda is increasingly shaped not just by human intent but by rapidly developing nonhuman listening systems—both in production and amplification. Algorithmic listening and perception are exploited to privilege what provokes, not what is true, complicating efforts to regulate digital hate, emotion, and influence. In this context, listening becomes not only a human practice of interpretation, but also a technical system of detection, ranking, and amplification—and, crucially, a site of failure where truth, trust, and perception can no longer be reliably aligned.
—
Featured Image: Photo by Stanislav Vlasov on Unsplash.
—
Olga Zaitseva-Herz is an ethnomusicologist working at the intersection of Ukrainian music, war, displacement, and digital culture. She is currently a postdoctoral researcher at the Kule Centre for Ukrainian and Canadian Folklore at the University of Alberta and a guest scholar at Think Space Ukraine at the University of Regensburg. Her research examines how song operates as a medium of political mediation, cultural diplomacy, and historical memory, with a particular focus on popular music and AI-generated sound during Russia’s full-scale invasion of Ukraine. Combining perspectives from ethnomusicology, sound studies, and media analysis, her work investigates how music shapes narratives of resistance, belonging, and global visibility, and how sonic practices illuminate the broader entanglements of culture, technology, and power.
—

REWIND! . . .If you liked this post, you may also dig:
Hate & Non-Human Listening, an Introduction–Kathryn Huether
Your Voice is (Not) Your Passport—Michelle Pfeifer
Mapping the Music in Ukraine’s Resistance to the 2022 Russian Invasion—Merje Laiapea
SO! Amplifies: An Interactive Map of Music as Ukrainian Resistance to the 2022 Russian Invasion—Merje Laiapea
Acousmatic Surveillance and Big Data

It’s an all too familiar movie trope. A bug hidden in a flower jar. A figure in shadows crouched listening at a door. The tape recording that no one knew existed, revealed at the most decisive of moments. Even the abrupt disconnection of a phone call manages to arouse the suspicion that we are never as alone as we may think. And although surveillance derives its meaning the latin “vigilare” (to watch) and French “sur-“ (over), its deep connotations of listening have all but obliterated that distinction.
Moving on from cybernetic games to modes of surveillance that work through composition and patterns. Here, Robin James challenges us to consider the unfamiliar resonances produced by our IP addresses, search histories, credit trails, and Facebook posts. How does the NSA transform our data footprints into the sweet, sweet, music of surveillance? Shhhhhhhh! Let’s listen in. . . -AT
—
Kate Crawford has argued that there’s a “big metaphor gap in how we describe algorithmic filtering.” Specifically, its “emergent qualities” are particularly difficult to capture. This process, algorithmic dataveillance, finds and tracks dynamic patterns of relationships amongst otherwise unrelated material. I think that acoustics can fill the metaphor gap Crawford identifies. Because of its focus on identifying emergent patterns within a structure of data, rather than its cause or source, algorithmic dataveillance isn’t panoptic, but acousmatic. Algorithmic dataveillance is acousmatic because it does not observe identifiable subjects, but ambient data environments, and it “listens” for harmonics to emerge as variously-combined data points fall into and out of phase/statistical correlation.
Dataveillance defines the form of surveillance that saturates our consumer information society. As this promotional Intel video explains, big data transcends the limits of human perception and cognition – it sees connections we cannot. And, as is the case with all superpowers, this is both a blessing and a curse. Although I appreciate emails from my local supermarket that remind me when my favorite bottle of wine is on sale, data profiling can have much more drastic and far-reaching effects. As Frank Pasquale has argued, big data can determine access to important resources like jobs and housing, often in ways that reinforce and deepen social inequities. Dataveillance is an increasingly prominent and powerful tool that determines many of our social relationships.
The term dataveillance was coined in 1988 by Roger Clarke, and refers to “the systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons.” In this context, the person is the object of surveillance and data is the medium through which that surveillance occurs. Writing 20 years later, Michael Zimmer identifies a phase-shift in dataveillance that coincides with the increased popularity and dominance of “user-generated and user-driven Web technologies” (2008). These technologies, found today in big social media, “represent a new and powerful ‘infrastructure of dataveillance,’ which brings about a new kind of panoptic gaze of both users’ online and even their offline activities” (Zimmer 2007). Metadataveillance and algorithmic filtering, however, are not variations on panopticism, but practices modeled—both historically/technologically and metaphorically—on acoustics.
In 2013, Edward Snowden’s infamous leaks revealed the nuts and bolts of the National Security Administration’s massive dataveillance program. They were collecting data records that, according to the Washington Post, included “e-mails, attachments, address books, calendars, files stored in the cloud, text or audio or video chats and ‘metadata’ that identify the locations, devices used and other information about a target.” The most enduringly controversial aspect of NSA dataveillance programs has been the bulk collection of Americans’ data and metadata—in other words, the “big data”-veillance programs.
Instead of intercepting only the communications of known suspects, this big dataveillance collects everything from everyone and mines that data for patterns of suspicious behavior; patterns that are consistent with what algorithms have identified as, say, “terrorism.” As Cory Doctorow writes in BoingBoing, “Since the start of the Snowden story in 2013, the NSA has stressed that while it may intercept nearly every Internet user’s communications, it only ‘targets’ a small fraction of those, whose traffic patterns reveal some basis for suspicion.” “Suspicion,” here, is an emergent property of the dataset, a pattern or signal that becomes legible when you filter communication (meta)data through algorithms designed to hear that signal amidst all the noise.
Hearing a signal from amidst the noise, however, is not sufficient to consider surveillance acousmatic. “Panoptic” modes of listening and hearing, though epitomized by the universal and internalized gaze of the guards in the tower, might also be understood as the universal and internalized ear of the confessor. This is the ear that, for example, listens for conformity between bodily and vocal gender presentation. It is also the ear of audio scrobbling, which, as Calum Marsh has argued, is a confessional, panoptic music listening practice.
Therefore, when President Obama argued that “nobody is listening to your telephone calls,” he was correct. But only insofar as nobody (human or AI) is “listening” in the panoptic sense. The NSA does not listen for the “confessions” of already-identified subjects. For example, this court order to Verizon doesn’t demand recordings of the audio content of the calls, just the metadata. Again, the Washington Post explains:
The data doesn’t include the speech in a phone call or words in an email, but includes almost everything else, including the model of the phone and the “to” and “from” lines in emails. By tracing metadata, investigators can pinpoint a suspect’s location to specific floors of buildings. They can electronically map a person’s contacts, and their contacts’ contacts.
NSA dataveillance listens acousmatically because it hears the patterns of relationships that emerge from various combinations of data—e.g., which people talk and/or meet where and with what regularity. Instead of listening to identifiable subjects, the NSA identifies and tracks emergent properties that are statistically similar to already-identified patterns of “suspicious” behavior. Legally, the NSA is not required to identify a specific subject to surveil; instead they listen for patterns in the ambience. This type of observation is “acousmatic” in the sound studies sense because the sounds/patterns don’t come from one identifiable cause; they are the emergent properties of an aggregate.
Acousmatic listening is a particularly appropriate metaphor for NSA-style dataveillance because the emergent properties (or patterns) of metadata are comparable to harmonics or partials of sound, the resonant frequencies that emerge from a specific combination of primary tones and overtones. If data is like a sound’s primary tone, metadata is its overtones. When two or more tones sound simultaneously, harmonics emerge whhen overtones vibrate with and against one another. In Western music theory, something sounds dissonant and/or out of tune when the harmonics don’t vibrate synchronously or proportionally. Similarly, tones that are perfectly in tune sometimes create a consonant harmonic. The NSA is listening for harmonics. They seek metadata that statistically correlates to a pattern (such as “terrorism”), or is suspiciously out of correlation with a pattern (such as US “citizenship”). Instead of listening to identifiable sources of data, the NSA listens for correlations among data.
Both panopticism and acousmaticism are technologies that incite behavior and compel people to act in certain ways. However, they both use different methods, which, in turn, incite different behavioral outcomes. Panopticism maximizes efficiency and productivity by compelling conformity to a standard or norm. According to Michel Foucault, the outcome of panoptic surveillance is a society where everyone synchs to an “obligatory rhythm imposed from the outside” (151-2), such as the rhythmic divisions of the clock (150). In other words, panopticism transforms people into interchangeable cogs in an industrial machine. Methodologically, panopticism demands self-monitoring. Foucault emphasizes that panopticism functions most efficiently when the gaze is internalized, when one “assumes responsibility for the constraints of power” and “makes them play…upon himself” (202). Panopticism requires individuals to synchronize themselves with established compulsory patterns.
Acousmaticism, on the other hand, aims for dynamic attunement between subjects and institutions, an attunement that is monitored and maintained by a third party (in this example, the algorithm). For example, Facebook’s News Feed algorithm facilitates the mutual adaptation of norms to subjects and subjects to norms. Facebook doesn’t care what you like; instead it seeks to transform your online behavior into a form of efficient digital labor. In order to do this, Facebook must adjust, in part, to you. Methodologically, this dynamic attunement is not a practice of internalization, but unlike Foucault’s panopticon, big dataveillance leverages outsourcing and distribution. There is so much data that no one individual—indeed, no one computer—can process it efficiently and intelligibly. The work of dataveillance is distributed across populations, networks, and institutions, and the surveilled “subject” emerges from that work (for example, Rob Horning’s concept of the “data self”). Acousmaticism tunes into the rhythmic patterns that synch up with and amplify its cycles of social, political, and economic reproduction.
Unlike panopticism, which uses disciplinary techniques to eliminate noise, acousmaticism uses biopolitical techniques to allow profitable signals to emerge as clearly and frictionlessly as possible amid all the noise (for more on the relation between sound and biopolitics, see my previous SO! essay). Acousmaticism and panopticism are analytically discrete, yet applied in concert. For example, certain tiers of the North Carolina state employee’s health plan require so-called “obese” and tobacco-using members to commit to weight-loss and smoking-cessation programs. If these members are to remain eligible for their selected level of coverage, they must track and report their program-related activities (such as exercise). People who exhibit patterns of behavior that are statistically risky and unprofitable for the insurance company are subject to extra layers of surveillance and discipline. Here, acousmatic techniques regulate the distribution and intensity of panoptic surveillance. To use Nathan Jurgenson’s turn of phrase, acousmaticism determines “for whom” the panoptic gaze matters. To be clear, acousmaticism does not replace panopticism; my claim is more modest. Acousmaticism is an accurate and productive metaphor for theorizing both the aims and methods of big dataveillance, which is, itself, one instrument in today’s broader surveillance ensemble.
–
Featured image “Big Brother 13/365” by Dennis Skley CC BY-ND.
–
Robin James is Associate Professor of Philosophy at UNC Charlotte. She is author of two books: Resilience & Melancholy: pop music, feminism, and neoliberalism will be published by Zer0 books this fall, and The Conjectural Body: gender, race and the philosophy of music was published by Lexington Books in 2010. Her work on feminism, race, contemporary continental philosophy, pop music, and sound studies has appeared in The New Inquiry, Hypatia, differences, Contemporary Aesthetics, and the Journal of Popular Music Studies. She is also a digital sound artist and musician. She blogs at its-her-factory.com and is a regular contributor to Cyborgology.
—
REWIND!…If you liked this post, check out:
“Cremation of the senses in friendly fire”: on sound and biopolitics (via KMFDM & World War Z)–Robin James
The Dark Side of Game Audio: The Sounds of Mimetic Control and Affective Conditioning–Aaron Trammell
Listening to Whisperers: Performance, ASMR Community, and Fetish on YouTube–Joshua Hudelson





















Recent Comments