Archive | Authenticities RSS for this section

Echo and the Chorus of Female Machines

8968192389_b2730510a1_z

Gendered Voices widgetEditor’s Note: February may be over, but our forum is still on! Today I bring you installment #5 of Sounding Out!‘s blog forum on gender and voice. Last week Art Blake talked about how his experience shifting his voice from feminine to masculine as a transgender man intersects with his work on John Cage. Before that, Regina Bradley put the soundtrack of Scandal in conversation with race and gender. The week before I talked about what it meant to have people call me, a woman of color, “loud.” That post was preceded by Christine Ehrick‘s selections from her forthcoming book, on the gendered soundscape. We have one more left! Robin James will round out our forum with an analysis of how ideas of what women should sound like have roots in Greek philosophy.

This week Canadian artist and writer AO Roberts takes us into the arena of speech synthesis and makes us wonder about what it means that the voices are so often female. So, lean in, close your eyes, and don’t be afraid of the robots’ voices. –Liana M. Silva, Managing Editor

I used Apple’s SIRI for the first time on an iPhone 4S. After hundreds of miles in a van full of people on a cross-country tour, all of the music had been played and the comedy mp3s entirely depleted. So, like so many first time SIRI users, we killed time by asking questions that went from the obscure to the absurd. Passive, awaiting command, prone to glitches: there was something both comedic and insidious about SIRI as female-gendered program, something that seemed to bind up the technology with stereotypical ideas of femininity.

"Maria the Maschinenmensch" by Flickr user Timothy Rose, CC BY-NC-SA 2.0

“Maria the Maschinenmensch” by Flickr user Timothy Rose, CC BY-NC-SA 2.0

Speech synthesis is the artificial simulation of the human voice through hardware or software, and SIRI is but one incarnation of the historical chorus of machines speaking what we code to be female. Starting from the early 20th century Voder, to the Cold-War era Silvia and Audrey, up to Amazon’s newly released Echo, researchers have by and large developed these applications as female personae. Each program articulates an individual timbre and character, soothing soft spoken or matter of fact; this is your mother, sister, or lover, here to affirm your interests while reminding you about that missed birthday. She is easy to call up in memory, tones rounded at the edges, like Scarlett Johansson’s smoky conviviality as Samantha in Spike Jonze’s Her, a bodiless purr. Simulated speech articulates a series of assumptions about what neutral articulation is, what a female voice is, and whose voice technology can ventriloquize.

The ways computers hear and speak the human voice are as complex as they are rapidly expanding. But in robotics gender is charted down to actual wavelength, actively policed around 100-150 HZ (male) and 200-250 HZ (female). Now prevalent in entertainment, navigation, law enforcement, surveillance, security, and communications, speech synthesis and recognition hold up an acoustic mirror to the dominant cultures from which they materialize. While they might provide useful tools for everything from time management to self-improvement, they also reinforce cisheteronormative definitions of personhood. Like the binary code that now gives it form, the development of speech recognition separated the entire spectrum of vocal expression into rigid biologically based categories. Ideas of a real voice vs. fake voice, in all their resonances with passing or failing one’s gender performance, have through this process been designed into the technology itself.

 

A SERIES OF MISERABLE GRUNTS

"Kempelen Speakingmachine" by Fabian Brackhane (Quintatoen), Saarbrücken - Own work. Licensed under Public Domain via Wikimedia Commons -

“Kempelen Speakingmachine” by Fabian Brackhane (Quintatoen), Saarbrücken – Own work. Licensed under Public Domain via Wikimedia Commons -

The first voice to be synthesized was a reed and bellows box invented by Wolfgang Von Kempelen in 1791 and shown off in the courts of the Hapsburg Empire. Von Kempelen had gained renown for his chess-playing Turk, a racist cartoon of an automaton that made waves amongst the nobles until it was revealed that underneath the tabletop was a small man secretly moving the chess player’s limbs. Von Kempelen’s second work, the speaking machine, wowed its audiences thoroughly. The player wheedled and squeezed the contraption, pushing air through its reed larynx to replicate simple words like mama and papa.

Synthesizing the voice has always required some level of making strange, of phonemic abstraction. Bell Laboratories originally developed The Voder, the earliest incarnation of the vocoder, as a cryptographic device for WWII military communications. The machine split the human voice into a spectral representation, fragmenting the source into number of different frequencies that were then recombined into synthetic speech. Noise and unintelligibility shielded the Allies’ phone calls from Nazi interception. The Vocoder’s developer, Ralph Miller, bemoaned the atrocities the machine performed on language, reducing it to a “series of miserable grunts.”

From website Binary Heap

From website Binary Heap-

In his history of the The Vocoder, How to Wreck a Nice Beach, Dave Tompkins tells how the apparatus originally took up an entire wall and was played solely by female phone operators, but the pitch of the female voice was said to be too high to be heard by the nascent technology. In fact, when it debuted at the 1939 World’s Fair, only men were chosen to experience the roboticization of their voice. The Voder was, in fact, originally created to only hear pitches in the range of 100-150 HZ, a designed exclusion from the start. So when the Signal Corps of the Army convinced President Eisenhower to call his wife via Voder from North Africa, Miller and the developers panicked for fear she wouldn’t be heard. Entering the Pentagon late at night, Mamie Eisenhower spoke into the telephone and a fragmented version of her words travelled across the Atlantic. Resurfacing in angular vocoded form, her voice urged her husband to come home, and he had no problem hearing her. Instead of giving the developers pause to question their own definitions of gender, this interaction is told as a derisive footnote of in the history of the sound and technology: the punchline being that the first lady’s voice was heard because it was as low as a man’s.

 

WAKE WORDS

Screen shot of Amazon's page for Echo

Screen shot of Amazon’s page for Echo

In fall 2014 Amazon launched Echo, their new personal assistant device. Echo is a 12-inch long plain black cone that stands upright on a tabletop, similar in appearance to a telephoto camera lens. Equipped with far field mics, Echo has a female voice, connected to the cloud and always on standby. Users engage Echo with their own chosen ‘wake’ word. The linguistic similarity to a BDSM safe word could have been lost on developers. Although here inverted, the word is used to engage rather than halt action, awakening an instrument that lays dormant awaiting command.

Amazon’s much-parodied promotional video for Echo is narrated by the innocent voice of the youngest daughter in a happy, straight, white, middle-class family. While the son pitches Oedipal jabs at the father for his dubious role as patriarchal translator of technology, each member of the family soon discovers the ways Echo is useful to them. They name it Alexa and move from questions like: “Alexa how many teaspoons in a tablespoon” and “How tall is Mt. Everest?” to commands for dance mixes and cute jokes. Echo enacts a hybrid role as mother, surrogate companion, and nanny of sorts not through any real aspects of labor but through the intangible contribution of information. As a female-voiced oracle in the early pantheon of the Internet of Things, Echo’s use value is squarely placed in the realm of cisheteronormative domestic knowledge production. Gone are the tongue-in-cheek existential questions proffered to SIRI upon its release. The future with Echo is clean, wholesome, and absolutely SFW. But what does it mean for Echo to be accepted into the home, as a female gendered speaking subject?

Concerns over privacy and surveillance quickly followed Echo’s release, alarms mostly sounding over its “always on” function. Amazon banks on the safety and intimacy we culturally associate with the female voice to ease the transition of robots and AI into the home. If the promotional video painted an accurate picture of Echo’s usage, it would appear that Amazon had successfully launched Echo as a bodiless voice over the uncanny valley, the chasm below littered with broken phalanxes of female machines. Masahiro Mori coined the now familiar term uncanny valley in 1970 to describe the dip in empathic response to humanoid robots as they approach realism.

If we listen to the litany of reactions to robot voices through the filters of gender and sexuality it reveals the stark inclines of what we might think of as a queer uncanny valley. Paulina Palmer wrote in The Queer Uncanny about reoccurring tropes in queer film and literature, expanding upon what Freud saw as a prototypical aspect of the uncanny: the doubling and interchanging of the self. In the queer uncanny we see another kind of rift: that between signifier and signified embodied by trans people, the tearing apart of gender from its biological basis. The non-linear algebra of difference posed by queer and trans bodies is akin to the blurring of divisions between human and machine represented by the cyborg. This is the coupling of transphobic and automatonophobic anxieties, defined always in relation to the responses and preoccupations of a white, able bodied, cisgendered male norm. This is the queer uncanny valley. For the synthesized voice to function here, it must ease the chasm, like Echo: sutured by a voice coded as neutral, but premised upon the imagined body of a white, heterosexual, educated middle class woman.

 

22% Female

uWf1j1iP_400x400My own voice spans a range that would have dismayed someone like Ralph Miller. I sang tenor in Junior High choir until I was found out for straying, and then warned to stay properly in the realms of alto, but preferably soprano range. Around the same time I saw a late night feature of Audrey Hepburn in My Fair Lady, struggling to lose her crass proletariat inflection. So I, a working class gender ambivalent kid, walked around with books on my head muttering The Rain In Spain Falls Mainly on the Plain for weeks after. I’m generally loud, opinionated and people remember me for my laugh. I have sung in doom metal and grindcore punk bands, using both screeching highs and the growling “cookie monster” vocal technique mostly employed by cismales.

EVAGiven my own history of toying with and estrangement from what my voice is supposed to sound like, I was interested to try out a new app on the market, the Exceptional Voice App (EVA ), touted as “The World’s First and Only Transgender Voice Training App.” Functioning as a speech recognition program, EVA analyzes the pitch, respiration, and character of your voice with the stated goal of providing training to sound more like one’s authentic self. Behind EVA is Kathe Perez, a speech pathologist and businesswoman, the developer and provider of code to the circuit. And behind the code is the promise of giving proper form to rough sounds, pitch-perfect prosody, safety, acceptance, and wholeness. Informational and training videos are integrated with tonal mimicry for phrases like hee, haa, and ooh. User progress is rated and logged with options to share goals reached on Twitter and Facebook. Customers can buy EVA for Gals or EVA for Guys. I purchased the app online for my iPhone for $5.97.

My initial EVA training scores informed me I was 22% female; a recurring number I receive in interfaces with identity recognition software. Facial recognition programs consistently rate my face at 22% female. If I smile I tend to get a higher female response than my neutral face, coded and read as male. Technology is caught up in these translations of gender: we socialize women to smile more than men, then write code for machines to recognize a woman in a face that smiles.

As for EVA’s usage, it seems to be a helpful pedagogical tool with more people sharing their positive results and reviews on trans forums every day. With violence against trans people persisting—even increasing—at alarming rates, experienced worst by trans women of color, the way one’s voice is heard and perceived is a real issue of safety. Programs like EVA can be employed to increase ease of mobility throughout the world. However, EVA is also out of reach to many, a classed capitalist venture that tautologically defines and creates users with supply. The context for EVA is the systems of legal, medical, and scientific categories inherited from Foucault’s era of discipline; the predetermined hallucination of normal sexuality, the invention of biological criteria to define the sexes and the pathologization of those outside each box, controlled by systems of biopower.

Despite all these tools we’ll never really know how we sound. It is true that the resonant chamber of our own skull provides us with a different acoustic image of our own voice. We hate to hear our voice recorded because suddenly we catch a sonic glimpse of what other people hear: sharper more angular tones, higher pitch, less warmth. Speech recognition and synthesis work upon the same logic, the shifting away from interiority; a just off the mark approximation. So the question remains what would a gender variant voice synthesis and recognition sound like? How much is reliant upon the technology and how much depends upon individual listeners, their culture, and what they project upon the voice? As markets grow, so too have more internationally accented English dialects been added to computer programs with voice synthesis. Thai, Indian, Arabic and Eastern European English were added to Mac OSX Lion in 2011. Can we hope to soon offer our voices to the industry not as a set of data to be mined into caricatures, but as a way to assist in the opening up in gender definitions? We would be better served to resist the urge to chime in and listen to the field in the same way we suddenly hear our recorded voice played back, with a focus on the sour notes of cold translation.

Featured image: “Golden People love Gold Jewelry Robots” by Flickr user epSos.de, CC BY 2.0

AO Roberts is a Canadian intermedia artist and writer based in Oakland whose work explores gender, technology and embodiment through sound, installation and print. A founding member of Winnipeg’s NGTVSPC feminist artist collective, they have shown their work at galleries and festivals internationally. They have also destroyed their vocal chords, played bass and made terrible sounds in a long line of noise projects and grindcore bands, including VOR, Hoover Death, Kursk and Wolbachia. They hold a BFA from the University of Manitoba and a MFA in Sculpture from California College of the Arts.

tape reelREWIND!…If you liked this post, you may also dig:

Hearing Queerly: NBC’s “The Voice”—Karen Tongson

On Sound and Pleasure: Meditations on the Human Voice—Yvon Bonefant

I Been On: BaddieBey and Beyoncé’s Sonic Masculinity—Regina Bradley

Reproducing Traces of War: Listening to Gas Shell Bombardment, 1918

listhornwwi

 

World Listening Month3

Welcome to World Listening Month 2014, our annual forum on listening in observation of World Listening Day on July 18th, 2014. World Listening Day is a time to think about the impacts we have on our auditory environments and, in turn, its affects on us [for the full deets, peep our recent SO! Amplifies post by Eric Leonardson, Executive Director of the World Listening Project].  We kick off our month of thinking critically about listening with a post by media historian Brian Hanrahan, who listens deeply to sonic traces of the past to prompt us to question our desires for contemporary media representations of “reality.”  It also marks the global 100 year anniversary of World War I this August 2014: a moment of silence. –J. Stoever, Editor-in-Chief

For some reason that I don’t fully understand, I am very emotionally moved by the space around a sound. I almost think that sometimes I am recording space with a sound in it, rather than sound in a space. -Walter Murch 

If you want to listen to the past, there’s never been a time like the present. Every year, it seems, new old recordings are identified, new techniques developed to recover sounds thought irrecoverable. Here is Bismarck’s voice, preserved on a cylinder in 1889.  Here, older still, is Edison’s. There is the astonishing recuperation of phonautograms – reverberation traced onto soot-blackened paper in the mid-nineteenth century, digitally processed and played back in our own. But as that processing underlines, no sound recording straightforwardly reproduces the real. An acoustic artifact is a compound of materiality, form and meaning, but also a place where technology meets desire. Old recordings meet the listener’s longing halfway; they invoke a reality always out of reach. And not simply a longing to hear, but also to touch, and be moved by, the fact of an absent existence.

Take, for instance, HMV 09308. In October 1918, just before the end of the Great War, William Gaisberg, a sound recordist of the pre-electric era, took recording equipment to the Western Front in order to capture the sound of British artillery shelling German lines with poison gas. Gaisberg died not long after, probably from Spanish flu, although some say he was weakened by gas exposure during the recording. Nonetheless the “Gas Shell Bombardment” record – a 12-inch HMV shellac disc, just over 2 minutes at 78 rpm – was released a few weeks later, just as the war came to an end. Initially intended to promote War Bonds, ultimately the record was used to raise money for disabled veterans.

war bonds

For decades, the HMV recording had a reputation as one of the very earliest “actuality” recordings – one documenting a real location and event beyond the performative space of the studio, imprinted with the audible material trace of an actual moment in space and time. Documents like this – no matter what the technology – usually come with additional symbolic authentication. Here, the record’s label does some of that work. This “historic recording,” says the subtitle, is an “actual record taken on the front line.” Publicity pieces drove home the message. In the popular HMV magazine The Voice, Gaisberg – or probably his posthumous ghost-writer – described the expedition in detail, claiming the track to be a “true representation of the bombardment.”

record

.

In the same issue, a Major C.J.C. Street compared the recording to his own experience on the Front. “Its realism,” he wrote, ”took my breath away… I played the record many times… finding at each attempt some well-remembered detail.” He didn’t say so in his article, but Street – an artillery officer, a novelist and a propaganda man for the intelligence agency MI7 – was in fact the impresario of the record. This was not the first time he had found astute uses for sound media. The previous year he had put together a record that set artillery drill commands to popular tunes – the recording was both a propaganda release and an army training tool for new recruits. With the Gas Shell record, Street knew he wasn’t just selling recorded sound, but also an auratic sense of closeness to an overwhelming reality, the palpable proximity of war and death. Authenticating detail helped to underpin this sense of an absent real made present. Street cued the listener for those “well-remembered details.” In particular, he singled out one indistinct rattly flap-whizz noise, hearing in it, he claimed, the sound of a round with a “loose driving-band.”

The record stayed in the HMV catalog until 1945, but only in the early 1990s were its production history and authenticity claims seriously examined. In specialist journals, archivists, collectors and amateur historians undertook a collective forensic and critical analysis. A promising auditory witness was located: 95-year-old Lt.-Col. Montagu Cleeve another former artillery officer, in his time a developer of “Boche Buster” railway gun, later a music professor – was invited to critically assess the recording. Cleeve vouched unreservedly for its authenticity. He heard in it, he said, an unmistakable succession of sounds – the clang of the breech, the gigantic report of the firing explosion, the distinctive whiny whistle of a gas shell on its way across no-man’s-land. Others looked to data rather than the memories of old soldiers. One expert on pre-electric recording noted the angles commanded in firing instructions, correlated them with known muzzle velocities for 4.5 and 6-inch howitzers, then used this and other information to “definitively” explain the counter-intuitive anti-Doppler sound of the shells’ whistling. He also identified the audible echo effect – the curious “double report” of the guns heard here – as the sound of a brass recording horn violently resonating at a distance of exactly 26.5 meters from the guns.

 

Peter Adamson, “The Gas Shell Bombardment record,” The Historic Record Quarterly, April 1991.

Peter Adamson, “The Gas Shell Bombardment record,” The Historic Record Quarterly, April 1991.

 

Eventually, skepticism won out. Close listening at slow speeds – just careful attention and notation, nothing more elaborate – revealed inconsistencies and oddities in the firing noises. The bongs, plops and whistles seemed internally inconsistent. Some of the artillery sounds – ostensibly a battery of four, firing in quick succession – varied implausibly with each successive firing. Physical evidence from the record’s groove, as well as extraneous noises – surface crackle and fizz, and, audible within the recording, the swish of a turntable – seemed to indicate at least two rudimentary overdubs, in which the output of one acoustic horn was relayed into a second, possibly using an auxetophone, an early compressed-air amplifier. All this resulted in a double- or triple-layered sonic artifact. Finally – the crucial evidence, although oddly it was hardly noticed at the time – an alternative take was located. In this take, according to its discoverer, the entire theatrics of gunnery command is simply absent, and there is no sound at all of whistling shells in motion. What was left was a skeleton sequence of clicks, thuds and cracks, supplemented with only a single closing insert, the portentous injunction “Feed the Guns with War Bonds!”

In short, it seems highly likely that any original field recording was, at the very least, post-dramatized with performed voices and percussive and whistling sound effects. So, it is tempting to say, that clears that up. The recording’s inauthenticity is proven. File under Fake. But in fact, if we don’t stop there, if we set aside narrow and absolutist ideas of authenticity, and instead explore the recording’s ambiguity and hybridity, then Gas Shell Bombardment becomes all the more interesting as an historical artifact.

Let’s assume, for the sake of argument, that some form of basic recording was done in France, very possibly a staged barrage specifically performed for Gaisberg’s visit, and that this recording then had effects added back at HMV in London. The record might then be seen less as a straightforward documentary, and instead as an unusual version of the “descriptive speciality,” a genre of miniature phonographic vignette dating back to the 1890s, far predating longer-form radio drama. Very little is known about these early media artworks, but it is a fair generalization to say that in America the genre was more slanted towards vaudeville comedy, whereas in Europe, imperial and military scenes predominated. As early as 1890, for example, there had been German phonographic representations of battles from the Franco-Prussian war. The Great War saw a flourishing of the genre. Scholars are just beginning to take an interest these old phonographs; here’s one recent essay on the “Angel of Mons,” for example, a British acoustic vignette of a famous incident on the Western Front.

Listen to a 1915 German descriptive speciality, depicting the attack on the fortress of Liège the previous year:

.

As a descriptive speciality, Gas Shell Bombardment is unusual because it incorporates an actual indexical trace. But such traces – as emphasized by Charles Sanders Pierce and many later media-theoreticians– do not resemble their referent, they are caused by it. The bullet hole does not look much like a bullet; thunder is lightning’s trace, not its likeness. But for Street and Gaisberg, the trace’s lack of resemblance caused problems: the original recording’s lack of detail, cues and clues, but above all its lack of internal dimensionality, created a perceptual shortfall and a lack of credibility. Maybe they hoped that the guns, by sheer force of amplitude, would overcome the spatially impoverished, reverbless reproduction of pre-electric recording. If so, it didn’t work. Without added effects, the guns’ trace was as flat and “body-less” as a sequence of Morse. It was a sound without a scene. The producers’ interventions aimed to thicken the primary artifact with referential-sounding detail, but also to heighten the sense of materiality and spatiality, and to strengthen the sense of diegetic presence, of worlded thereness. The soldiers’ voices – louder and quieter, close-up and farther-out – and the fake-Doppler of the “shell whistling” lent the recording narrative direction (literally, some trajectory) and “authenticating” points of detail. But above all they gave a sense of internal space to the recording, a space into which the listener could direct her attention.

In this context, we can only admire the creativity and performative élan of the unknown production crew. We know little about effects production in early phonography. It is a safe bet that some techniques were adopted from theatre, and that there was overlap with silent film accompaniment. But whatever the method used, it would have called for the awkward orchestration of a limited number of iconic sounds to create an impression of a spatially coherent and materially detailed sonic environment. The recordist and his team would first have had to imagine how relative loudness – of voices, of material objects struck and sounded – might create a sense of spatial depth when transduced through the horn’s crude interface. Then they would have had to perform this as a live overdub, keeping time with the base track of the gun recording played through another horn. And all this done with participants and equipment crowded tightly around the mouth of the huge horn, crammed into the tiny pick-up arc, a scene looking something like this image of Leopold Stokowski’s pre-electric recording sessions or this photograph of the recording of a cello concerto.

Acoustic recording session with Elgar and Beatrice Harrison, 1920

Acoustic recording session with Elgar and Beatrice Harrison, 1920

As well as this hybrid of trace and live performance, there is another performance here – Gaisberg’s journey itself. With twenty years of recording experience, Gaisberg was probably very well aware that the expedition would not yield a “realistic” recording of the guns. But the expedition had to be made, so that it could be said to have taken place. Expectations had to be primed and colored, so that, to use André Bazin’s famous phrase about photographs, the recording could partake in an “irrational power to… bear the belief” of the listener. The journey, and the accounts of Gaisberg and Street are not a supplement to the “true representation” of the gas bombardment. They are part of that representation. Moreover, in subsequent writing it is noticeable that the manner of Gaisberg’s death becomes a rhetorical amplification for the authenticity of the recording’s trace, as if his fatal inhalation (of gas molecules or flu bacilli) were itself a deadly indexation, paralleling the recording’s claim to capture the breath of the War, and even of History itself.

In media-historical terms, the Gas Shell Bombardment recording can be understood as a late, transitional artifact from phonography’s pre-microphonic era. The desire for the sonic trace, for an ever more immersive proximity to events was there, but electro-acoustic technology was not yet in place. Two years later, in 1920, Horace Merriman and Lionel Guest made the first experimental electrical recording, arguably also the first true field recording. The event, appropriately enough, was an official war memorial service in London, where Merriman and Guest – working for Columbia Records – put microphones in Westminster Abbey, running cables to a remote recording van parked in the street outside, where they sat amidst heating ovens and cutting lathes. By the end of the 1920s, remote recording and broadcasting, while never straightforward, were well on the way to ubiquity.

Illustrated London News, 1920.

Illustrated London News, 1920.

Claims made on behalf of technologies of reproduction may seem simplistic, but there’s a grain of truth to their simplicity. If there were nothing special – even magical – in the referentiality of the camera that captures the moment, the recording that’s like being there, the liveness of the live broadcast, these things would not play the role they do in everyday life and in the ideological fabric of society. But there is falsehood too, in over-simplifying the nature and affective charge of old photographs, old footage, old recordings. These are made things, composed of different materials, media, signs and conventions; they are inseparable from the desires and expectations they induce and direct. They function in part by mimesis and verisimilitude, but also through the gaps, blank spots and false illusions of their trace. They can – rightly – intensify our feeling towards the past, but should also prompt us to think about our own desires and investments.

Image by Flickr User DrakeGoodman, “Horchposten im Spengtrichter vor Neuve-Chapelle 6km nördlich von La Bassée Nordfrankreich 1916,” A trio of lightly equipped soldiers from an unidentified formation oblige the photographer by looking serious and pretending they’re just metres from the enemy, listening for activity in his lines. The improvised “listening device” is actually a large funnel, probably liberated from a nearby farm.

Brían Hanrahan is a film, media and cultural historian, whose work focuses on the history of acoustic media, German and European cinema and the culture of the Weimar Republic.

Edited post-publication at 8:00 pm EST on July 7, 2014


tape reelREWIND!…If you liked this post, you may also dig:

SO! Reads: Susan Schmidt Horning’s Chasing Sound: Technology, Culture and the Art of Studio Recording from Edison to the LP–Enongo Lumumba-Kasongo

A Brief History of Auto-Tune–Owen Marshall

DIY Histories: Podcasting the Past–Andrew Salvati

  

This Is How You Listen: Reading Critically Junot Díaz’s Audiobook

2167001398_ff97f313a4_o

Last month, T.M. Luhrmann compared the experience of reading a written book versus listening to books in the New York Times article “Audiobooks and the Return of Storytelling.” Lurhmann points out how audiobook sales jumped 20% in 2012, whereas total industry book sales went down 1%. From the looks of it, books have benefited from audiobook sales, but in literary studies, print remains the primary vehicle for analysis. Might listening to an audiobook actually change how we critically read a text?

As I listened to Junot Díaz narrate This Is How You Lose Her  (2012), the first book Díaz has read as an audiobook and the first book of short stories the author has published since 1996’s Drown, I wondered how his reading influenced how I interpreted the text. Díaz’s reading sounds less like regular speech and more like a performance, with its own cadence and rhythm:

This post approaches the audiobook as a text in itself, coming from a sound studies perspective. I attempt to conceptualize the idea of “close listening” as a methodology akin to “close reading” in literary studies. I listen for how Diaz reads the text but more specifically how the reading itself becomes a way of authoring the text.  Ultimately, I argue that Díaz’s reading becomes a re-authoring the text—re-writing the text sonically. On a broader level, I hope to add to the conversation of what it means to read an audiobook, as Birgitte Stougaard Pederson and Ibsen Have brought up in “Conceptualising the Audiobook Experience.” Using This Is How You Lose Her, I show that reading an audiobook means engaging with the text from the angle of the ear, and that close listening can become an aural reading practice that relies not so much on the visual texts, but on aural cues from the narrator.

Not one but two (!) copies of This Is How You Lose Her

Not one but two (!) copies of This Is How You Lose Her

This Is How You Lose Her revolves around Yunior, a young Dominican immigrant who grows up in New Jersey and who ends up as a professor in Boston, and the many loves he has had or that he has encountered growing up. The stories trace his progress from a young, recently arrived Yunior, to a tenured, mature Yunior, showcasing certain relationships that influence how he relates to women—in sum, illustrating how he loses the women he loves. Throughout the short story collection, Díaz also calls attention to other relationships that may influence Yunior’s perspective, for example, his brother’s attachments with women, especially toward the end of his young life as he battled cancer, and his father’s relationship with his mistress, a Dominican woman who lived in New Jersey. At the end, Díaz illuminates how a mujeriego (womanizer) like Yunior comes to be; the short stories indicate that Yunior is as much a product of his environment as he is a seller of the merchandise.

Díaz is not a professional audiobook narrator. Although Díaz has done live readings, reading the full-length version of a book one has written is a different exercise. The Penguin Audio version of the collection is based on the actual short story collection (in other words, unabridged), so it does not contain additional stories or behind the scenes interviews. Technically, it is no different than the print version.

Listening to authors read their own work has value beyond the pleasure of hearing them read their text. Scholarly writing on audiobooks has emphasized the experience of listening to an audiobook for pleasure (like Deborah Phillips’ “Talking Books: The Encounter of Literature and Technology in the Audiobook” and James Shokoff’s “What Is An Audiobook?”), but it wasn’t until the 2011 edited collection Audiobooks, Literature, and Sound Studies that audiobooks were considered on their own instead of as extensions of the literature they were based on. The allure of doing this scholarly exercise with the audiobook version of This Is How You Lose Her is that Díaz’s delivery of the text is uncommon at the least.

"Junot Diaz at the Southern Festival of Books" by Flickr user Stacey Kizer, CC BY-NC 2.0

“Junot Diaz at the Southern Festival of Books” by Flickr user Stacey Kizer, CC BY-NC 2.0

Talking about Junot Díaz’s readerly voice requires to tune into conversations about his writerly voice. In many reviews of Díaz’s books, writers discuss how Díaz deftly conveys a writer’s voice in his text, indicating that his success is that his characters have a very clear voice—or at least Yunior does. Michiko Kakutani, for example, points out how “Junot Díaz has one of the most distinctive and magnetic voices in contemporary fiction: limber, streetwise, caffeinated and wonderfully eclectic, capable of conjuring for the reader everything from the sorrows of Dominican history to the banalities of life in New Jersey.” Although this quotation is in reference to Díaz’s second book, The Brief Wondrous Life of Oscar Wao, it describes Díaz’s writing in terms of his voice instead of, for instance, in terms of his use of metaphors or choice of subject.

Richard Wolinsky, in his Guernica interview with Díaz, sees an overlap between Yunior and Díaz: “He’s [Yunior] got a very distinct voice, and it’s a voice that’s informed by [Diaz’s] own reading, particularly science fiction and fantasy.” Although Díaz has pointed out that Yunior is loosely based on events that have happened to him,  Wolinsky “hears” Díaz in his main character. The tone and the language Yunior uses is read as a reflection of Díaz.

Conversations about the voice of the writer point to a sensibility about sound, but are often limited to a written text. Anna Barnet, in an interview with Junot Díaz, states “His two principal linguistic registers (‘this kind of crazy Caribbean language and music’ and ‘this sort of African-American-infused American vernacular’) grind against each other along with the many other voices he ventriloquizes in his writing.” Barnet reminds readers that Díaz’s writing style is based in spoken language—particularly Díaz’s spoken language. This language of “voice” to describe a writer’s style (or, specifically, a writer’s ability to convey a clear sense of who the character is and/or their views) is commonplace but gives the impression that there is a sonic aspect to an author’s work, when in reality it is but a metaphor for something that occurs at the level of text.

A critical reading of a text that includes the audiobook rendition allows critics to add substance to those references to “voice.” In Junot Díaz’s case, it is possible that readers encounter him first through written text, and so have an expectation of what Díaz (or Yunior) would sound like live.  In my textual analysis of eight audiobook reviews (and one book review that included a mention of the narration in the audiobook) most listeners showed some sort of discomfort with Díaz’s narration. One reviewer, for example, had issue with the “smoothness” of Díaz’s narration: “At times the reading was a little shaky and uneven”. Another reviewer stated “at times his cadence is choppy, with odd pauses and emphasis on strange words that detract from the overall experience.” Reviewers also had an issue with Díaz’s pace, which is characterized by pauses in places that many not seem normal in casual American speech. These statements hint at a “weird” quality in Díaz’s speech, something that does not come through when Díaz has a casual conversation. (Listen to this podcast episode of NPR’s Alt. Latino guest-starring Díaz and compare with this video of him reading part of This Is How You Lose Her.) Although one blogger pointed out that Díaz sounded “professorial” in the reading, others used the words “native,” “authenticity,” “Dominican” and even “Jersey accent” to describe how Díaz sounded. It is unclear how these reviewers define “native” or “authentic.”

"Junot Diaz" by Flickr user ALA The American Library Association, CC BY-NC-SA 2.0

“Junot Diaz” by Flickr user ALA The American Library Association, CC BY-NC-SA 2.0

Connecting sound to authenticity implies that Dominicans can only sound a certain way, or that the audio narration is lacking when it does not represent a “typical” Dominican voice. To the extent that Díaz is Dominican, his voice is of a Dominican male who has grown up in the Northeastern United States. His uneven audio narration creates a feeling of sonic unintelligibility in the listener, similar to the effect of including Spanish words in the written text. Díaz-as-narrator can make a listener uncomfortable, and by extension forces that reader to listen.

The sonic unintelligibility also relies on the text, on how Díaz plays with language by switching back and forth from English to Spanish. Díaz mentions in an interview with Marva Hinton that some readers are not happy with his choice of Spanglish in his writing: “There [are] folks who hear one Spanish word, and they’re convinced this is some sort of immigrant conspiracy” Farther down, in the same article, Díaz refers to his mix of Spanish and English (and a particular kind of Spanish and English at that, since he moves among Standard American English, African American Vernacular English, and Dominican Spanish) as “opaque language.” There’s a connection between the kind of “opaqueness” that Spanish gives and the unintelligible effect of Díaz read his work.

An example of how sonic unintelligibility operates in the audiobook is the first story, “The Sun, The Moon, The Stars.” This opener, told in first person, revolves about one of Yunior’s break-ups; Yunior and his girlfriend Magdalena, on whom he cheated, go to the Dominican Republic on a trip they had planned before she found out about the affair. It frames the book as being an in-depth analysis of loves lost, from the man who keeps losing them. It also sets the tone sonically for the audiobook reading: after the introduction of the book, a snippet of bachata music comes on, and then makes way for Díaz, who reads the title of the story. This is the pattern of the book: slices of bachata, followed by Díaz’s narration.

His voice is characterized by a slight sing-song cadence that is reminiscent of Dominican Spanish accent. If this were in Spanish, it might be easier to lose track of the cadence, but in English it sounds like a disembodied accent. I showcase the swing in Díaz’s narration by alternating capital letters and lower-case letters: “Her FAther, who usually would treat me like his HIjo, CALLS me an ASShole on the PHONE, SOUNDS like he’s STRANgling himself with the cord.” The voice seems to float for a while until Díaz arrives to the end of a paragraph or a series of sentences, and then it sinks. Moreover, this pattern does not change when Díaz switches characters: it’s hard to tell Yunior apart from Magdalena unless the reader pays close attention to when the narrator is switching characters and/or when the narrator uses a pronoun. The same effect comes from the odd pauses in the author’s narration: “Oh God, she wailed. Oh. My God.”

The choppiness and the emphasis in the reading are a way to dislocate the listener, in a similar way that Spanish phrases or lack of quotation marks in the text dislocate a reader who does not understand Spanish or who depends on the quotation marks to make sense of the prose.  Also, this story focuses on Magdalena withdrawing from Yunior and not communicating with him. The tone, cadence, and sound of Díaz’s voice can be read to mirror the relationship between Yunior and Magdalena (and the other women in the text): the sonic unintelligibility is manifest at the level of plot through Yunior’s relationships.

Although many audiobook reviewers may consider the plot in their reviews, part of what makes an audiobook stand out is the performance of the text. I take my cues from audiobook reviewers and consider critically my listening experience of This Is How You Lose Her and how this can become the basis for a critical interpretation of the text.  My analysis underscores that having an author read a text can provide a different way into analyzing the text and prompts readers to pay attention to sound. If, like Shokoff asserts, most audiobook readers listen to an audiobook while doing something else, Díaz shows that listening closely to the audio text can be as rewarding as reading a book.

Featured Image: “Junot Diaz” by WBUR Boston’s NPR News Station, Attribution-NonCommercial-NoDerivs License

Liana Silva-Ford is co-founder and Managing Editor of Sounding Out!.

tape reelREWIND! . . .If you liked this post, you may also dig:

“‘Or Does It Explode?’ Sounding Out the U.S. Metropolis in Hansberry’s A Raisin in the Sun-Liana Silva-Ford

“‘HOW YOU SOUND??': The Poet’s Voice, Aura, and the Challenge of Listening to Poetry”-John Hyland

“Fade to Black, Old Sport: How Hip-Hop Amplifies Baz Luhrmann’s The Great Gatsby-Regina Bradley

%d bloggers like this: