“I Am Thinking Of Your Voice”: Gender, Audio Compression, and a Sonic Cyberfeminist Theory of Oppression

I developed the text I recite in this post as the theoretical framework for an article I’m working on about audio compression. As I was working on the article, I wondered about the role of gender and race in the research on audio compression. Specifically, I was reminded of the central role Suzanne Vega’s “Tom’s Diner” played into research that led to the mp3. Karl-Heinz Brandenburg used the song to test the compression method he was developing for mp3s because it sounded “warm.” Sure, the track is very intimate and Vega’s voice is soft and vulnerable. But to what extent is its “warmth” the effect of a man’s perception of Vega addressing him as either/both an intimate partner or caregiver? Is its so-called warmth dependent upon the extent to which Vega’s voice performs idealized white hetero femininity, a role from which patriarchy definitely expects warmth (intimacy, care work) but can’t be bothered to hear anything beyond or other than that from (white) women?

“Suzanne Vega 13. Inselleuchten 02” by Wikimedia Commons user Olaf Tausch under GFDL license (http://www.gnu.org/copyleft/fdl.html) or CC BY 3.0 license (https://creativecommons.org/licenses/by/3.0)

In other words, I’m wondering about what ways our compression practices are shaped by white supremacist, patriarchal listening ears. Before anyone even runs an audio signal through a compressor, how do patriarchal gender systems already themselves act as a kind of epistemological and sensory compression that separates out essential from inessential signal, such that we let women’s warm, caring voices through while also demanding they discipline themselves into compressing their anger and rage away?

The literature does address the role of sexism and ableism in the shaping of audio technologies, but this critique is most commonly framed in conventionally liberal terms that understand oppression as a matter of researcher bias that excludes and censors minority voices. For example, the literature addresses the way “cultural differences like gender, age, race, class, nationality, and language” are overlooked by researchers (Jonathan Sterne), offers cursory nods to the biases and preferences of white cis men scientists (Ryan Maguire), or claims that “the principles of efficiency and universality central to the history of signal processing also worked to censure atypical voices and minor modes of communication” (Mara Mills). Though such analyses are absolutely necessary components of sonic cyberfeminist practice, they are not sufficient.

“Untitled” by Flickr user Charlotte Cooper, CC-BY-2.0

We also need to consider the ways frequencies get parsed into the structural positions that masculinity and femininity occupy in Western patriarchal gender systems. Patriarchy doesn’t just influence researchers, their preferences, their choices, and their judgments. How is the break between essential and inessential signal mapped onto the gendered break between what Beauvoir calls “Absolute” and “Other,” masculine and feminine? Patriarchy is not just a relation among people; it is also a relation among sounds. I don’t think this is inconsistent with the positions I cited earlier in this paragraph; rather, I am pursuing the concerns that motivate those positions a bit more emphatically. And this is perhaps because our objects of analysis are slightly different: I’m a political philosopher interested in political structures that shape epistemologies and ontologies—such as the patriarchal gender system organized by masculine absolute/feminine other—whereas most of the scholars I cited earlier have a more STS- and media-studies-approach that is interested in material culture.

As a way to address these questions, I made a short critical karaoke-style sound piece where I read a shortened version of the text below over the original version of “Tom’s Diner” from Vega’s album Solitude Standing (which, for what it’s worth, I first owned on cassette, not digitally). I recorded my voice reciting a condensed version of the framework I develop for a sonic cyberfeminist theory of oppression over a copy of the original, a cappella version of “Tom’s Diner.” If I were in philosopher mode, I would theorize the full implications of this aesthetic choice, but I’m offering this as a sound art piece, the material and sensory dimensions of which provide y’all the opportunity to think through those implications yourselves.

[Text from audio]

Perceptual coding and perceptual technics create breaks in the audio spectrum in the same way that neoliberalism and biopolitics create breaks in the spectrum of humanity. Perceptual coding refers to “those forms of audio coding that use a mathematical model of human hearing to actively remove sound in the audible part of the spectrum under the assumption that it will not be heard” (loc 547). Neoliberalism and biopolitics use a mathematical model of human life to actively remove people from eligibility for moral and political personhood on the assumption that they will not be missed. They each use the same basic set of techniques: a normalized model of hearing, the market, or life defines the parameters of what should be included and what should be disposed of, in order to maximize the accumulation of private property/personhood.

These parameters are not objective but grounded in what Jennifer Lynne Stoever calls a “listening ear”: “a socially constructed ideological system producing but also regulating cultural ideas about sound” (13). Perceptual coding uses white supremacist, capitalist presumptions about the limits of humanity to mark a break in what counts as sound and what counts as noise…such as presumptions about feminine voices like Suzanne Vega’s.

Perceptual coding subjects audio frequencies to the same techniques of government and management that neoliberalism and biopolitics subject people to. For this reason, it can serve as a specifically sonic cyberfeminist theory of oppression.

It shows us not just how oppression works under neoliberalism and biopolitics, but also its motivations and effects. The point is to increase the efficient accumulation of personhood as property by white supremacist capitalist patriarchal institutions. Privilege is the receipt of social investment and the ability to build on it by access to circulation. Oppression is the denial of this investment and access to circulation. For example, mass incarceration takes people of color out of circulation and subjects them to carceral logics…because this is the way such populations are most profitable for neoliberal and biopolitical white supremacist capitalist patriarchy.

Featured image: “Solo show: Order and Progress at Fabio Paris Art Gallery (Brescia, 15 January 2011)” by Flickr user Roͬͬ͠͠͡͠͠͠͠͠͠͠͠sͬͬ͠͠͠͠͠͠͠͠͠aͬͬ͠͠͠͠͠͠͠ Menkman, CC BY-NC 2.0

Robin James is Associate Professor of Philosophy at UNC Charlotte. She is author of two books: Resilience & Melancholy: pop music, feminism, and neoliberalism, published by Zer0 books last year, and The Conjectural Body: gender, race and the philosophy of music was published by Lexington Books in 2010. Her work on feminism, race, contemporary continental philosophy, pop music, and sound studies has appeared in The New Inquiry, Hypatia, differences, Contemporary Aesthetics, and the Journal of Popular Music Studies. She is also a digital sound artist and musician. She blogs at its-her-factory.com and is a regular contributor to Cyborgology.

Tape Hiss, Compression, and the Stubborn Materiality of Sonic Diaspora

In an article for Pitchfork, music critic Adam Ward reminisces about digital music files that sound as if they’re “being played through a payphone,” and calls the extreme compression of the low-quality MP3 “this generation’s vinyl crackle or skipping CD.” The crackles, hisses, and compression that characterize such sound files are what I term “encoded materiality.”  Focusing on the encoded materiality of the digital helps us to reconfigure our approach to sonic media, understanding how the compression of early MP3s and tape hiss remind us not only of lost fidelity, but also of the richness of exchange. These warm and stubborn sonic impurities, having been encoded in our digital listening formats and thus achieving repeatability and variability, act as persistent reminders that we can think diaspora beyond melancholy and authenticity, sidestepping the questions of purity and loss that so often characterize dialogues in the field of diaspora studies.

In Mechanisms, his work on electronic textuality, Matthew Kirschenbaum proposes a “material matrix governing writing and inscription in all forms” composed of four elements: “erasure, variability, repeatability, and survivability” (xiii). The defects of sonic technology that become encoded in digital files are one such type of inscription. Tape hiss and other recording accidents–such as Casey Kasem ruining your attempt to tape record the first Western song you fell in love with after leaving Hong Kong by fading the outro and butting in with his banter–achieve repetition and survival during the digital encoding process, becoming a welcome reminder of time and place. Such materiality helps us to better understand the politics of diaspora. It clues us in to how the elements of textual encoding (erasure, variability, repeatability, and survivability) become embedded within diaspora’s complex logic.

Image by DraconianRain @Flickr CC BY-NC.

Image by DraconianRain @Flickr CC BY-NC.

To think through these complex moments of exchange, let me offer a story about my experience with tape hiss. I grew up listening to music touched by this particular sonic grain: a ground level of noise upon which my sonic experiences were built. After I received my first iPod in 2005, I connected a tape player to the input of my computer, recorded a stack of tapes, and then manually split them into MP3s—pseudo-piracy committed in earnest. A few weeks ago, I dug up these same files and put them on my phone, once again returning the buried albums to their former glory on a constant rotation playlist. I keep returning to these particular files, rather than finding the now easily available digital versions, because I admire the survivability of their materiality. The materiality of these tracks allowed me to trace the complexity of my own history—the tape hiss is just as much a part of this history as the songs themselves.

After first moving to Canada from Hong Kong, my family and I established ourselves by unswervingly performing the same routine each weekend. We would have late lunch at our favorite dim sum restaurant, drive around for a bit, and then relax at home; there wasn’t much to do in the ex-urbs of Toronto. On those drives, we listened to selections from a stack of cassette tapes in the glovebox of our old Pontiac Bonneville. Sally Yeh’s 1987 album Blessing was on constant rotation and received its fair share of wear. This was one of the tapes I recorded to my computer, destined for digitization.

Because I hit the record button a few seconds early, my MP3 of Sally Yeh’s Blessing begins with a few seconds of silence. It’s enough to trick me into thinking that the song isn’t playing. In a quiet enough spot, I can hear that it’s actually tape hiss. No matter where I am, on the road or in the shower, my mind fills in the blank with the thick ker-chunk of the cassette entering that Pontiac stereo right before that familiar tape hiss would fill the car, always giving us a few sometimes-needed, sometimes-awkward moments of silence before the music started. The sonic texture of that tape stems from its material nature as plastic and metal. The hiss itself is due to the size of the magnetized particles on the plastic. Because of these sounds, the song tells its own story. It recalls our shared sonic and material experience as I migrate it from device to device.

Before Blessing made its way into our car, it was one of the few cassette tapes that my parents carefully packed into a dozen cardboard boxes and shipped by sea to Canada in the late 1980s. This was in the midst of the countrywide protests in China that led to the events at Tiananmen Square. That insistent ker-chunk of plastic on metal that my brain inserts every time I play the MP3s keeps my experience of the music grounded in this earlier history, too. Strange that a fluffy pop song would remind me of the serious political strife taking place on the doorstep of a Hong Kong nervously awaiting its “handover.” This sonic anchor’s ability to recall to me these snippets of history, both personal, national, and transpacific has been crucial in the development of my own diasporic identity. Listening to this particular recording of Blessing helps me to keep track of my self and my history.


The act of withdrawal that many of us perform in order to interface with our sonic technologies, as Alexander Weheliye shows in his reading of Ralph Ellison’s Invisible Man in Phonographies, can play a powerful role in understanding one’s own racial subjectivity. Weheliye focuses on the scene in which the titular narrator-protagonist retreats to a subterranean cave-like space to listen to Louis Armstong’s recorded, disembodied voice in complete solitude. He asserts that the narrator builds his own subjectivity through a recognition of the self by projecting that self onto Louis Armstrong’s “vocal apparatus,” that is, his voice coming through a phonograph (143). “The phonograph’s ability to disconnect the singing voice from its face, or rather to replace it with a technological visage, further heightens its materiality, which impels the protagonist to imbue Armstrong’s voice with a surplus of signification” (Weheliye 145).

More than a black and white photo or a stern historical lecture from the elders, the “heightened materiality” of the digital format, a type of “technological visage” cathects my own diasporic history most forcefully to the sonic anchor of tape hiss because it acts as a “voice without a face” in the same way as the phonographic Armstrong. But despite the privacy of the phonographic listening act in this scenario, Weheliye suggests that

the phonographic listening modality also bears the traces of sociality… since the listening subject is drawn out of him/herself by encountering the technologically mediated sounds of other subjects—we might even go so far as to suggest that the phonograph itself functions as a subject, especially in its interfacings with various humans. (165)

So it is with similar sonic technologies that can encourage the “eschewing [of] the social” such as iPhones, CDs, and, yes, cassette tapes. Like Ellison’s narrator interfacing with the mechanical apparatus that conveys Armstrong’s voice, the insistent “defects” kept on the digital file keep the mechanism of its delivery at the fore, allowing me first to understand that diasporic feeling of dis-ease—and to imagine beyond it.

Sally Yeh's "Blessing." Image used with permission by the author.

Sally Yeh’s “Blessing.” Image used with permission by the author.

What I gain from the digital yet still stubbornly material tape of Blessing is not any overt lyrical or thematic gesture to a diasporic subjectivity on the artist’s part, but rather an induction into what Giorgio Agamben calls, “the idea of an inessential commonality, a solidarity that in no way concerns an essence” (18), or perhaps a community based on “belonging itself” (84). Likewise, Weheliye’s “diasporic citizenship coarticulate[s] the national and transnational instead of playing a zero-sum game with political identification” (369).  If diaspora is defined by the perpetual desire to seek an imagined originary point of true identity that inevitably leads to melancholy, as psychoanalysis maintains, tape hiss and other encoded materialities turn the gaze away from the mists of origin, validating instead the development of diasporic identity in the aftermath of emigration. Of course, loss and melancholy are legitimate psychic aspects of the diasporic experience, as persuasively demonstrated by scholars such as David Eng, Shinhee Han, Anne Anlin Cheng, but they neither define the whole experience nor are they mutually exclusive to it. It is in this way that we can think of diaspora as a community of belonging by becoming.

A consideration of the stubborn ways that materiality is encoded in the digital helps us to think of diaspora as more than psychic fait accompli—it is also a ‘coming community’ characterized by the process of belonging. Kirschenbaum’s matrix provides the right foundation for a study which considers how material inscriptions are related to our diasporic lives. The inscription that defined my diasporic becoming came from the cassette tape that travelled across the ocean in a boat for five weeks, escaped erasure, survived repeated playings, became digital, and lives on now as a hissing reminder of our history of emigration. What else may we find about our own becoming and belonging if we attune our ears to the encoded materialities of sonic diaspora?

Featured image “Decayed Cassette” by darkday @Flickr CC BY.

Chris Chien is a Ph.D. candidate in the Department of English at the University of Southern California working variously in the areas of sound, diaspora and transpacific studies, all with a distinctly queer bent. He completed his M.A. in English Literature at Loyola Marymount University and his Honors B.A. in English Literature and Latin at the University of Toronto. Chris has presented papers on angelic gender fluidity in John Milton's Paradise Lost and post-colonial affect in the work of Herman Melville and Amitav Ghosh at the Rocky Mountain MLA and South Atlantic MLA conferences respectively. He is currently developing a paper that examines the performativity of diaspora, masculinity, and the capitalist ethos in Eddie Huang's memoir Fresh Off the Boat and its adaptation as an ABC sitcom.

A Brief History of Auto-Tune

Sound and TechThis is the final article  in Sounding Out!‘s April  Forum on “Sound and Technology.” Every Monday this month, you’ve heard new insights on this age-old pairing from the likes of Sounding Out! veteranos Aaron Trammell and Primus Luta along with new voices Andrew Salvati and Owen Marshall.  These fast-forward folks have shared their thinking about everything from Auto-tune to techie manifestos. Today, Marshall helps us understand just why we want to shift pitch-time so darn bad. Wait, let me clean that up a little bit. . .so darn badly. . .no wait, run that back one more time. . .jjuuuuust a little bit more. . .so damn badly. Whew! There! Perfect!–JS, Editor-in-Chief

A recording engineer once told me a story about a time when he was tasked with “tuning” the lead vocals from a recording session (identifying details have been changed to protect the innocent). Polishing-up vocals is an increasingly common job in the recording business, with some dedicated vocal producers even making it their specialty. Being able to comp, tune, and repair the timing of a vocal take is now a standard skill set among engineers, but in this case things were not going smoothly. Whereas singers usually tend towards being either consistently sharp or flat (“men go flat, women go sharp” as another engineer explained), in this case the vocalist was all over the map, making it difficult to always know exactly what note they were even trying to hit. Complicating matters further was the fact that this band had a decidedly lo-fi, garage-y reputation, making your standard-issue, Glee-grade tuning job decidedly inappropriate.

Undaunted, our engineer pulled up the Auto-Tune plugin inside Pro-Tools and set to work tuning the vocal, to use his words, “artistically” – that is, not perfectly, but enough to keep it from being annoyingly off-key. When the band heard the result, however, they were incensed – “this sounds way too good! Do it again!” The engineer went back to work, this time tuning “even more artistically,” going so far as to pull the singer’s original performance out of tune here and there to compensate for necessary macro-level tuning changes elsewhere.

"Melodyne screencap" by Flickr user Ethan Hein, CC BY-NC-SA 2.0

“Melodyne screencap” by Flickr user Ethan Hein, CC BY-NC-SA 2.0

The product of the tortuous process of tuning and re-tuning apparently satisfied the band, but the story left me puzzled… Why tune the track at all? If the band was so committed to not sounding overproduced, why go to such great lengths to make it sound like you didn’t mess with it? This, I was told, simply wasn’t an option. The engineer couldn’t in good conscience let the performance go un-tuned. Digital pitch correction, it seems, has become the rule, not the exception, so much so that the accepted solution for too much pitch correction is more pitch correction.

Since 1997, recording engineers have used Auto-Tune (or, more accurately, the growing pantheon of digital pitch correction plugins for which Auto-Tune, Kleenex-like, has become the household name) to fix pitchy vocal takes, lend T-Pain his signature vocal sound, and reveal the hidden vocal talents of political pundits. It’s the technology that can make the tone-deaf sing in key, make skilled singers perform more consistently, and make MLK sound like Akon. And at 17 years of age, “The Gerbil,” as some like to call Auto-Tune, is getting a little long in the tooth (certainly by meme standards.) The next U.S. presidential election will include a contingent of voters who have never drawn air that wasn’t once rippled by Cher’s electronically warbling voice in the pre-chorus of “Believe.” A couple of years after that, the Auto-Tune patent will expire and its proprietary status will dissolve into to the collective ownership of the public domain.


Growing pains aside, digital vocal tuning doesn’t seem to be leaving any time soon. Exact numbers are hard to come by, but it’s safe to say that the vast majority of commercial music produced in the last decade or so has most likely been digitally tuned. Future Music editor Daniel Griffiths has ballpark-estimated that, as early as 2010, pitch correction was used in about 99% of recorded music. Reports of its death are thus premature at best. If pitch correction is seems banal it doesn’t mean it’s on the decline; rather, it’s a sign that we are increasingly accepting its underlying assumptions and internalizing the habits of thought and listening that go along with them.

Headlines in tech journalism are typically reserved for the newest, most groundbreaking gadgets. Often, though, the really interesting stuff only happens once a technology begins to lose its novelty, recede into the background, and quietly incorporate itself into fundamental ways we think about, perceive, and act in the world. Think, for example, about all the ways your embodied perceptual being has been shaped by and tuned-in to, say, the very computer or mobile device you’re reading this on. Setting value judgments aside for a moment, then, it’s worth thinking about where pitch correction technology came from, what assumptions underlie the way it works and how we work with it, and what it means that it feels like “old news.”

"Anti-Tune symbol"

“Anti-Tune symbol”

As is often the case with new musical technologies, digital pitch correction has been the target for no small amount of controversy and even hate. The list of indictments typically includes the homogenization of music, the devaluation of “actual talent,” and the destruction of emotional authenticity. Suffice to say, the technological possibility of ostensibly producing technically “pitch-perfect” performances has wreaked a fair amount of havoc on conventional ways of performing and evaluating music. As Primus Luta reminded us in his SO! piece on the powerful-yet-untranscribable “blue notes” that emerged from the idiosyncrasies of early hardware samplers, musical creativity is at least as much about digging-into and interrogating the apparent limits of a technology as it is about the successful removal of all obstacles to total control of the end result.

Paradoxically, it’s exactly in this spirit that others have come to the technology’s defense: Brian Eno, ever open to the unexpected creative agency of perplexing objects, credits the quantized sound of an overtaxed pitch corrector with renewing his interest in vocal performances. SO!’s own Osvaldo Oyola, channeling Walter Benjamin, has similarly offered a defense of Auto-Tune as a democratizing technology, one that both destabilizes conventional ideas about musical ability and allows everyone to sing in-tune, free from the “tyranny of talent and its proscriptive aesthetics.”

"Audiodatenkompression: Manowar, The Power of Thy Sword" by Wikimedia user Moehre1992, CC BY-SA 3.0

“Audiodatenkompression: Manowar, The Power of Thy Sword” by Wikimedia user Moehre1992, CC BY-SA 3.0

Jonathan Sterne, in his book MP3, offers an alternative to normative accounts of media technology (in this case, narratives either of the decline or rise of expressive technological potential) in the form of “compression histories” – accounts of how media technologies and practices directed towards increasing their efficiency, economy, and mobility can take on unintended cultural lives that reshape the very realities they were supposed to capture in the first place. The algorithms behind the MP3 format, for example, were based in part on psychoacoustic research into the nature of human hearing, framed primarily around the question of how many human voices the telephone company could fit into a limited bandwidth electrical cable while preserving signal intelligibility. The way compressed music files sound to us today, along with the way in which we typically acquire (illegally) and listen to them (distractedly), is deeply conditioned by the practical problems of early telephony. The model listener extracted from psychoacoustic research was created in an effort to learn about the way people listen. Over time, however, through our use of media technologies that have a simulated psychoacoustic subject built-in, we’ve actually learned collectively to listen like a psychoacoustic subject.

Pitch-time manipulation runs largely in parallel to Sterne’s bandwidth compression story. The ability to change a recorded sound’s pitch independently of its playback rate had its origins not in the realm of music technology, but in efforts to time-compress signals for faster communication. Instead of reducing a signal’s bandwidth, pitch manipulation technologies were pioneered to reduce the time required to push the message through the listener’s ears and into their brain. As early as the 1920s, the mechanism of the rotating playback head was being used to manipulate pitch and time interchangeably. By spinning a continuous playback head relative to the motion of the magnetic tape, researchers in electrical engineering, educational psychology, and pedagogy of the blind found that they could increase playback rate of recorded voices without turning the speakers into chipmunks. Alternatively, they could rotate the head against a static piece of tape and allow a single moment of recorded sound to unfold continuously in time – a phenomenon that influenced the development of a quantum theory of information

In the early days of recorded sound some people had found a metaphor for human thought in the path of a phonograph’s needle. When the needle became a head and that head began to spin, ideas about how we think, listen, and communicate followed suit: In 1954 Grant Fairbanks, the director of the University of Illinois’ Speech Research Laboratory, put forth an influential model of the speech-hearing mechanism as a system where the speaker’s conscious intention of what to say next is analogized to a tape recorder full of instructions, its drive “alternately started and stopped, and when the tape is stationary a given unit of instruction is reproduced by a moving scanning head”(136). Pitch time changing was more a model for thinking than it was for singing, and its imagined applications were thus primarily non-musical.

Take for example the Eltro Information Rate Changer. The first commercially available dedicated pitch-time changer, the Eltro advertised its uses as including “pitch correction of helium speech as found in deep sea; Dictation speed testing for typing and steno; Transcribing of material directly to typewriter by adjusting speed of speech to typing ability; medical teaching of heart sounds, breathing sounds etc.by slow playback of these rapid occurrences.” (It was also, incidentally, used by Kubrick to produce the eerily deliberate vocal pacing of HAL 9000). In short, for the earliest “pitch-time correction” technologies, the pitch itself was largely a secondary concern, of interest primarily because it was desirable for the sake of intelligibility to pitch-change time-altered sounds into a more normal-sounding frequency range.


This coupling of time compression with pitch changing continued well into the era of digital processing. The Eventide Harmonizer, one of the first digital hardware pitch shifters, was initially used to pitch-correct episodes of “I Love Lucy” which had been time-compressed to free-up broadcast time for advertising. Similar broadcast time compression techniques have proliferated and become common in radio and television (see, for example, Davis Foster Wallace’s account of the “cashbox” compressor in his essay on an LA talk radio station.) Speed listening technology initially developed for the visually impaired has similarly become a way of producing the audio “fine print” at the end of radio advertisements.

"H910 Harmonizer" by Wikimedia user Nalzatron, CC BY-SA 3.0

“H910 Harmonizer” by Wikimedia user Nalzatron, CC BY-SA 3.0

Though the popular conversation about Auto-Tune often leaves this part out, it’s hardly a secret that pitch-time correction is as much about saving time as it is about hitting the right note. As Auto-Tune inventor Andy Hildebrand put it,

[Auto-Tune’s] largest effect in the community is it’s changed the economics of sound studios…Before Auto-Tune, sound studios would spend a lot of time with singers, getting them on pitch and getting a good emotional performance. Now they just do the emotional performance, they don’t worry about the pitch, the singer goes home, and they fix it in the mix.

Whereas early pitch-shifters aimed to speed-up our consumption of recorded voices, the ones now used in recording are meant to reduce the actual time spent tracking musicians in studio. One of the implications of this framing is that emotion, pitch, and the performer take on a very particular relationship, one we can find sketched out in the Auto-Tune patent language:

Voices or instruments are out of tune when their pitch is not sufficiently close to standard pitches expected by the listener, given the harmonic fabric and genre of the ensemble. When voices or instruments are out of tune, the emotional qualities of the performance are lost. Correcting intonation, that is, measuring the actual pitch of a note and changing the measured pitch to a standard, solves this problem and restores the performance. (Emphasis mine. Similar passages can be found in Auto-Tune’s technical documentation.)

In the world according to Auto-Tune, the engineer is in the business of getting emotional signals from place to place. Emotion is the message, and pitch is the medium. Incorrect (i.e. unexpected) pitch therefore causes the emotion to be “lost.” While this formulation may strike some people as strange (for example, does it mean that we are unable to register the emotional qualities of a performance from singers who can’t hit notes reliably? Is there no emotionally expressive role for pitched performances that defy their genre’s expectations?), it makes perfect sense within the current affective economy and division of labor and affective economy of the recording studio. It’s a framing that makes it possible, intelligible, and at least somewhat compulsory to have singers “express emotion” as a quality distinct from the notes they hit and have vocal producers fix up the actual pitches after the fact. Both this emotional model of the voice and the model of the psychoacoustic subject are useful frameworks for the particular purposes they serve. The trick is to pay attention to the ways we might find ourselves bending to fit them.


Owen Marshall is a PhD candidate in Science and Technology Studies at Cornell University. His dissertation research focuses on the articulation of embodied perceptual skills, technological systems, and economies of affect in the recording studio. He is particularly interested in the history and politics of pitch-time correction, cybernetics, and ideas and practices about sensory-technological attunement in general. 

Featured image: “Epic iPhone Auto-Tune App” by Flickr user Photo Giddy, CC BY-NC 2.0

CLICK HERE TO DOWNLOAD: Interview with Jonathan Sterne



This podcast provokes Jonathan Sterne to jam on the history of Sound Studies, critique the soundscape, and talk about MP3s. That said, it was really just a way to talk about his super-cool music projects (really, check them out!). Aaron Trammell interviews Jonathan Sterne, and digs deep into the questions at the core of our discipline.

Jonathan Sterne teaches in the Department of Art History and Communication Studies and the History and Philosophy of Science Program at McGill University.  He is author of The Audible Past: Cultural Origins of Sound Reproduction (Duke, 2003), MP3: The Meaning of a Format (Duke 2012); and numerous articles on media, technologies and the politics of culture.  He is also editor of The Sound Studies Reader (Routledge, 2012).

