In 1912, British physicist Edmund Fournier d’Albe built a device that he called the optophone, which converted light into tones. The first model—“the exploring optophone”—was meant to be a travel aid; it converted light into a sound of analogous intensity. A subsequent model, “the reading optophone,” scanned print using lamp-light separated into beams by a perforated disk. The pattern of light reflected back from a given character triggered a corresponding set of tones in a telephone receiver. d’Albe initially worked with 8 beams, producing 8 tones based on a diatonic scale. He settled on 5 notes: lower G, and then middle C, D, E and G. (Sol, do, re, mi, sol.) The optophone became known as a “musical print” machine. It was popularized by Mary Jameson, a blind student who achieved reading speeds of 60 words per minute.
In the field of media studies, the optophone has become renowned through its imaginary repurposings by a number of modernist artists. For one thing, the optophone finds brief mention in Finnegan’s Wake. In turn, Marshall McLuhan credited James Joyce’s novel for being a new medium, turning text into sound. In “New Media as Political Forms,” McLuhan says that Joyce’s own “optophone principle” releases us from “the metallic and rectilinear embrace of the printed page.” More familiar within media studies today, Dada artist Raoul Hausmann patented (London 1935), but did not successfully build, an optophone presumably inspired by d’Albe’s model, which he hoped would be employed in audiovisual performances. This optophone was meant to convert sound into light as well as the reverse. It was part of a broader contemporary impulse to produce color music and synaesthetic art. Hausmann also wrote optophonetic poetry, based on the sounds and rhythms of “pure phonemes” and non-linguistic noises. In response, Francis Picabia painted two optophone portraits in 1921 and 22. Optophone I, below, is composed of lines that might be sound waves, with a pattern that disorders vision.
Theorists have repeatedly located Hausmann’s device at the origin of new media. Authors in the Audiovisuology, Media Archaeology, and Beyond Art: A Third Culture anthologies credit Hausmann’s optophone with bringing-into-being cybernetics, digitization, the CD-ROM, audiovisual experiments in video art, and “primitive computers.” It seems to have escaped notice that d’Albe also used the optophone to create electrical music. In his book, The Moon Element, he writes:
d’Albe’s device is typically portrayed as a historical cul-de-sac, with few users and no real technical influence. Yet optophones continued to be designed for blind people throughout the twentieth century; at least one model has users even today. Musical print machines, or “direct translators,” co-existed with more complex OCR-devices—optical character recognizers that converted printed words into synthetic speech. Both types of reading machine contributed to today’s procedures for scanning and document digitization. Arguably, reading optophones intervened more profoundly into the order of print than did Hausmann’s synaesthetic machine: they not only translated between the senses, they introduced a new symbolic system by which to read. Like braille, later vibrating models proposed that the skin could also read.
In December 1922, the Optophone was brought to the United States from the United Kingdom for a demonstration before a number of educators who worked with blind children; only two schools ordered the device. Reading machine development accelerated in the U.S. around World War II. In his position as chair of the National Defense Research Committee, Vannevar Bush established a Committee on Sensory Devices in 1944, largely for the purpose of rehabilitating blind soldiers. The other options for reading—braille and Talking Books—were relatively scarce and had a high cost of production. Reading machines promised to give blind readers access to magazines and ephemeral print (recipes, signs, mail), which was arguably more important than access to books.
At RCA (Radio Corporation of America), the television innovator Vladimir Zworykin became involved with this project. Zworykin had visited Fournier d’Albe in London in the 19-teens and seen a demonstration of the optophone. Working with Les Flory and Winthrop Pike, Zworykin built an initial machine known as the A-2 that operated on the same principles, but used a different mechanism for scanning—an electric stylus, which was publicized as “the first pen that reads.” Following the trail of citations for RCA’s “Reading Aid for the Blind” patent (US 2420716A, filed 1944), it is clear that the “pen” became an aid in domains far afield from blindness. It was repurposed as an optical probe for measuring the oxygen content of blood (1958); an “optical system for facsimile scanners” (1972); and, in a patent awarded to Burroughs Corporation in 1964, a light gun. This gun, in turn, found its way into the handheld controls for the first home video game system, produced by Sanders Associates.
The A-2 optophone was tested on three blind research subjects, including ham radio enthusiast Joe Piechowski, who was more of a technical collaborator. According to the reports RCA submitted to the CSD, these readers were able to correlate the “chirping” or “tweeting” sounds of the machine with letters “at random with about eighty percent accuracy” after 60 hours of practice. Close spacing on a printed page made it difficult to differentiate between letters; readers also had difficulty moving the stylus at a steady pace and in a straight line. Piechowski achieved reading speeds of 20 words per minute, which RCA deemed too slow.
Attempts were made to incorporate “human factors” and create a more efficient tonal code, to reduce reading time as well as learning time and confusion between letters. One alternate auditory display was known as the compressed optophone. Rather than generate multiple tones or chords for a single printed letter, which was highly redundant and confusing to the ear, the compressed version identified only certain features of a printed letter: such as the presence of an ascender or descender. Below is a comparison between the tones of the original optophone and the compressed version, recorded by physicist Patrick Nye in 1965. The following eight lower case letters make up the source material: f, i, k, j, p, q, r, z.
Original record in the author’s possession. With thanks to Elaine Nye, who generously tracked down two of her personal copies at the author’s request. The second copy is now held at Haskins Laboratories.
Because of the seeming limitations of tonal reading, RCA engineers re-directed their research to add character recognition to the scanning process. This was controversial, direct translators like the optophone being perceived as too difficult because they required blind people to do something akin to learning to read print—learning a symbolic tonal or tactile code. At an earlier moment, braille had been critiqued on similar grounds; many in the blind community have argued that mainstream anxieties about braille sprang from its symbolic difference. Speed, moreover, is relative. Reading machine users protested that direct translators like the optophone were inexpensive to build and already available—why wait for the refinement of OCR and synthetic speech? Nevertheless, between November 1946 and May 1947, Zworykin, Flory, and Pike worked on a prototype “letter reading machine,” today widely considered to be the first successful example of optical character recognition (OCR). Before reliable synthetic speech, this device spelled out words letter by letter using tape recordings. The Letter-Reader was too massive and expensive for personal use, however. It also had an operating speed of 20 words per minute—thus it was hardly an improvement over the A-2 translator.
Haskins Laboratories, another affiliate of the Committee on Sensory Devices, began working on the reading machine problem around the same time, ultimately completing an enormous amount of research into synthetic speech and—as argued by Donald Shankweiler and Carol Fowler—the “speech code” itself. In the 1940s, before workable text-to-speech, researchers at Haskins wanted to determine whether tones or artificial phonemes (“speech-like speech”) were easier to read by ear. They developed a “machine dialect of English,” named wuhzi: “a transliteration of written English which preserved the phonetic patterns of the words.” An example can be played below. The eight source words are: With, Will, Were, From, Been, Have, This, That.
Original record in the author’s possession. From Patrick Nye, “An Investigation of Audio Outputs for a Reading Machine” (1965). With thanks to Elaine Nye.
Based on the results of tests with several human subjects, the Haskins researchers concluded that aural reading via speech-like sounds was necessarily faster than reading musical tones. Like the RCA engineers, they felt that a requirement of these machines should be a fast rate of reading. Minimally, they felt that reading speed should keep pace with rapid speech, at about 200 words per minute.
Funded by the Veterans Administration, members of Mauch Laboratories in Ohio worked on both musical optophones and spelled-speech recognition machines from the 1950s into the 1970s. One of their many devices, the Visotactor, was a direct-translator with vibro-tactile output for four fingers. Another, the Visotoner, was a portable nine-channel optophone. All of the Mauch machines were tested by Harvey Lauer, a technology transfer specialist for the Veterans Administration for over thirty years, himself blind. Below is an excerpt from a Visotoner demonstration, recorded by Lauer in 1971.
Visotoner demonstration. Original 7” open reel tape in author’s possession. With thanks to Harvey Lauer for sharing items from his impressive collection and for collaborating with the author over many years.
Later on the same tape, Lauer discusses using the Visotoner to read mail, identify currency, check over his own typing, and read printed charts or graphics. He achieved reading speeds of 40 words per minute with the device. Lauer has also told me that he prefers the sound of the Visotoner to that of other optophone models—he compares its sound to Debussy, or the music for dream sequences in films.
Mauch also developed a spelled speech OCR machine called the Cognodictor, which was similar to the RCA model but made use of synthetic speech. In the recording below, Lauer demonstrates this device by reading a print-out about IBM fonts. He simultaneously reads the document with the Visotoner, which reveals glitches in the Cognodictor’s spelling.
Original 7” open reel tape in the author’s possession. With thanks to Harvey Lauer.
In 1972, at the request of Lauer and other blind reading machine users, Mauch assembled a stereo-optophone with ten channels, called the Stereotoner. This device was distributed through the VA but never marketed, and most of the documentation exists in audio format, specifically in sets of training tapes that were made for blinded veterans who were the test subjects. Some promotional materials, such as the short video below, were recorded for sighted audiences—presumably teachers, rehabilitation specialists, or funding agencies.
Video courtesy of Harvey Lauer.
Mary Jameson corresponded with Lauer about the stereotoner, via tape and braille, in the 1970s. In the braille letter pictured below she comments, “I think that stereotoner signals are the clearest I have heard.”
In 1973, with the marketing of the Kurzweil Reader, funding for direct translation optophones ceased. The Kurzweil Reader was advertised as the first machine capable of multi-font OCR; it was made up of a digital computer and flatbed scanner and it could recognize a relatively large number of typefaces. Kurzweil recalls in his book The Age of Spiritual Machines that this technology quickly transferred to Lexis-Nexis as a way to retrieve information from scanned documents. As Lauer explained to me, the abandonment of optophones was a serious problem for people with print disabilities: the Kurzweil Readers were expensive ($10,000-$50,000 each); early models were not portable and were mostly purchased by libraries. Despite being advertised as omnifont readers, they could not in fact recognize most printed material. The very fact of captchas speaks to the continued failures of perfect character recognition by machines. And, as the “familiarization tapes” distributed to blind readers indicate, the early synthetic speech interface was not transparent—training was required to use the Kurzweil machines.
Original cassette in the author’s possession.
Lauer always felt that the ideal reading machine should have both talking OCR and direct-translation capabilities, the latter being used to get a sense of the non-text items on a printed page, or to “preview material and read unusual and degraded print.” Yet the long history of the optophone demonstrates that certain styles of decoding have been more easily naturalized than others—and symbols have increasingly been favored if they bear a close relation to conventional print or speech. Finally, as computers became widely available, the focus for blind readers shifted, as Lauer puts it, “from reading print to gaining access to computers.” Today, many electronic documents continue to be produced without OCR, and thus cannot be translated by screen readers; graphical displays and videos are largely inaccessible; and portable scanners are far from universal, leaving most “ephemeral” print still unreadable.
Mara Mills is an Assistant Professor of Media, Culture, and Communication at New York University, working at the intersection of disability studies and media studies. She is currently completing a book titled On the Phone: Deafness and Communication Engineering. Articles from this project can be found in Social Text, differences, the IEEE Annals of the History of Computing, and The Oxford Handbook of Sound Studies. Her second book project, Print Disability and New Reading Formats, examines the reformatting of print over the course of the past century by blind and other print disabled readers, with a focus on Talking Books and electronic reading machines. This research is supported by NSF Award #1354297.
The following video installation by Mandie O’Connell, is part three of a four part series, “Round Circle of Resonance” by the Berlin based arts collective La Mission that performs connections between the theory of José Esteban Muñoz and sound art/study/theory/performance.
The first installment and second installments ran last Monday. The opening salvo, written by La Mission’s resident essayist / deranged propagandist LMGM (Luis-Manuel Garcia) provides a brief introduction to our collective, some reflections on Muñoz’s relevance to our activities, and a frame for the next three missives from our fellow cultists. It is backed with a rousing sermon-cum-manifesto from our charismatic cult-leader/prophet, El Jefe (Pablo Roman-Alcalá). Next Monday, our saucy Choir Boy/Linguist (Johannes Brandis) will close the forum with a dirge to our dearly departed José (August 9, 1967- December, 4, 2013).
–LMGM a.k.a. Luis-Manuel Garcia (curator)
Concept and Performance: Mandie O’Connell
Filming and Editing: Piss Nelke
Music: Khrom Ju (La Mission)
Piss is Power.
Power exists in urination, in this basic and most crucial of bodily acts. Problems with urination can result in embarrassment, infection, hospitalization. And yet so many of us women encounter confining, unfair, cruel, and Puritan limitations to where, when, and how we can pee, while our male counterparts traipse around urinating wherever they please. It is time, brothers and sisters, to re-politicize piss.
Brother Muñoz taught us that utopian projects require fellow participants, not audiences. We need a Urinary Utopia, a Piss Paradise that is open to men, women, trans and intersex people of all colors. Let’s shower down a blissful piss, a rainbow-colored golden shower where we all can piss wherever the fuck we want to!
In my performance video, I attempt to create a Muñoz-inspired utopian sensibility through the enactment of a new modality of an everyday action. I use a Female Urination Device—which enables me to stand up and urinate—to take a Yellow Adventure around my neighborhood. I piss freely in places where my penis-having brethren piss. I piss in a urinal next to which “Piss on me Bitch” is crudely scrawled. I piss into the river Spree, symbolically owning it with my liquid gold. Finally, I write my name in piss, a macho action turned feminine, the power and privilege of said action redirected towards my vagina.
In “Standing Up,” three different sounds are mixed together to create the soundscape of the performance: ambient noise, music, and sound clips of urination. The ambient noise serves to locate the scene in space/time. The music by Khrom Ju was selected to give the performance an eerie, strange, and repetitive undertone. The sound of urination was recorded live and is the sound of female urination. We use this sound both as a cue and as comic relief. Piss is funny, piss is strange, and piss happens all around us.
Urination and the female struggle around it is a real struggle that really happens and really matters. Exceptionally long lines for the ladies’ room, the inability to publically urinate at festivals due to feeling exposed and shamed, being charged money to use toilet facilities when males can piss outdoors for free, getting forced to use a ladies’ room when your sexuality sways towards using the men’s room, the list of complaints goes on and on. So I say: pee where you want, not where others want you to. Pee on administrators, police, politicians, and oppressors of all kinds while you’re at it!
I refuse to adhere to these rules anymore, and I beg you to follow my lead.
Piss is Power.
Featured Image adapted from “Pee” by Flickr User Melissa Eleftherion Carr
Mandie O’Connell (yo) aka “Knuckle Cartel, is a former big cheese and intellectual powerhouse behind the wildly successful Seattle-based experimental theater company Implied Violence. I, Mandie, have experienced the same “conservatism” and capitalistic partnership between Money and Art in the performance/theater scene. Witnessing firsthand the immense power that cash-wielding creeps hold over creatives is sickening, sad, and sordid. I’ve had enough, and so have you…right? Let’s fix a broken system. If we can’t fix it, let’s circumvent it.
REWIND!…If you liked this post, check out:
On Sound and Pleasure: Meditations on the Human Voice–Yvon Bonenfant
It’s an all too familiar movie trope. A bug hidden in a flower jar. A figure in shadows crouched listening at a door. The tape recording that no one knew existed, revealed at the most decisive of moments. Even the abrupt disconnection of a phone call manages to arouse the suspicion that we are never as alone as we may think. And although surveillance derives its meaning the latin “vigilare” (to watch) and French “sur-“ (over), its deep connotations of listening have all but obliterated that distinction.
Moving on from cybernetic games to modes of surveillance that work through composition and patterns. Here, Robin James challenges us to consider the unfamiliar resonances produced by our IP addresses, search histories, credit trails, and Facebook posts. How does the NSA transform our data footprints into the sweet, sweet, music of surveillance? Shhhhhhhh! Let’s listen in. . . -AT
Kate Crawford has argued that there’s a “big metaphor gap in how we describe algorithmic filtering.” Specifically, its “emergent qualities” are particularly difficult to capture. This process, algorithmic dataveillance, finds and tracks dynamic patterns of relationships amongst otherwise unrelated material. I think that acoustics can fill the metaphor gap Crawford identifies. Because of its focus on identifying emergent patterns within a structure of data, rather than its cause or source, algorithmic dataveillance isn’t panoptic, but acousmatic. Algorithmic dataveillance is acousmatic because it does not observe identifiable subjects, but ambient data environments, and it “listens” for harmonics to emerge as variously-combined data points fall into and out of phase/statistical correlation.
Dataveillance defines the form of surveillance that saturates our consumer information society. As this promotional Intel video explains, big data transcends the limits of human perception and cognition – it sees connections we cannot. And, as is the case with all superpowers, this is both a blessing and a curse. Although I appreciate emails from my local supermarket that remind me when my favorite bottle of wine is on sale, data profiling can have much more drastic and far-reaching effects. As Frank Pasquale has argued, big data can determine access to important resources like jobs and housing, often in ways that reinforce and deepen social inequities. Dataveillance is an increasingly prominent and powerful tool that determines many of our social relationships.
The term dataveillance was coined in 1988 by Roger Clarke, and refers to “the systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons.” In this context, the person is the object of surveillance and data is the medium through which that surveillance occurs. Writing 20 years later, Michael Zimmer identifies a phase-shift in dataveillance that coincides with the increased popularity and dominance of “user-generated and user-driven Web technologies” (2008). These technologies, found today in big social media, “represent a new and powerful ‘infrastructure of dataveillance,’ which brings about a new kind of panoptic gaze of both users’ online and even their offline activities” (Zimmer 2007). Metadataveillance and algorithmic filtering, however, are not variations on panopticism, but practices modeled—both historically/technologically and metaphorically—on acoustics.
In 2013, Edward Snowden’s infamous leaks revealed the nuts and bolts of the National Security Administration’s massive dataveillance program. They were collecting data records that, according to the Washington Post, included “e-mails, attachments, address books, calendars, files stored in the cloud, text or audio or video chats and ‘metadata’ that identify the locations, devices used and other information about a target.” The most enduringly controversial aspect of NSA dataveillance programs has been the bulk collection of Americans’ data and metadata—in other words, the “big data”-veillance programs.
Instead of intercepting only the communications of known suspects, this big dataveillance collects everything from everyone and mines that data for patterns of suspicious behavior; patterns that are consistent with what algorithms have identified as, say, “terrorism.” As Cory Doctorow writes in BoingBoing, “Since the start of the Snowden story in 2013, the NSA has stressed that while it may intercept nearly every Internet user’s communications, it only ‘targets’ a small fraction of those, whose traffic patterns reveal some basis for suspicion.” “Suspicion,” here, is an emergent property of the dataset, a pattern or signal that becomes legible when you filter communication (meta)data through algorithms designed to hear that signal amidst all the noise.
Hearing a signal from amidst the noise, however, is not sufficient to consider surveillance acousmatic. “Panoptic” modes of listening and hearing, though epitomized by the universal and internalized gaze of the guards in the tower, might also be understood as the universal and internalized ear of the confessor. This is the ear that, for example, listens for conformity between bodily and vocal gender presentation. It is also the ear of audio scrobbling, which, as Calum Marsh has argued, is a confessional, panoptic music listening practice.
Therefore, when President Obama argued that “nobody is listening to your telephone calls,” he was correct. But only insofar as nobody (human or AI) is “listening” in the panoptic sense. The NSA does not listen for the “confessions” of already-identified subjects. For example, this court order to Verizon doesn’t demand recordings of the audio content of the calls, just the metadata. Again, the Washington Post explains:
The data doesn’t include the speech in a phone call or words in an email, but includes almost everything else, including the model of the phone and the “to” and “from” lines in emails. By tracing metadata, investigators can pinpoint a suspect’s location to specific floors of buildings. They can electronically map a person’s contacts, and their contacts’ contacts.
NSA dataveillance listens acousmatically because it hears the patterns of relationships that emerge from various combinations of data—e.g., which people talk and/or meet where and with what regularity. Instead of listening to identifiable subjects, the NSA identifies and tracks emergent properties that are statistically similar to already-identified patterns of “suspicious” behavior. Legally, the NSA is not required to identify a specific subject to surveil; instead they listen for patterns in the ambience. This type of observation is “acousmatic” in the sound studies sense because the sounds/patterns don’t come from one identifiable cause; they are the emergent properties of an aggregate.
Acousmatic listening is a particularly appropriate metaphor for NSA-style dataveillance because the emergent properties (or patterns) of metadata are comparable to harmonics or partials of sound, the resonant frequencies that emerge from a specific combination of primary tones and overtones. If data is like a sound’s primary tone, metadata is its overtones. When two or more tones sound simultaneously, harmonics emerge whhen overtones vibrate with and against one another. In Western music theory, something sounds dissonant and/or out of tune when the harmonics don’t vibrate synchronously or proportionally. Similarly, tones that are perfectly in tune sometimes create a consonant harmonic. The NSA is listening for harmonics. They seek metadata that statistically correlates to a pattern (such as “terrorism”), or is suspiciously out of correlation with a pattern (such as US “citizenship”). Instead of listening to identifiable sources of data, the NSA listens for correlations among data.
Both panopticism and acousmaticism are technologies that incite behavior and compel people to act in certain ways. However, they both use different methods, which, in turn, incite different behavioral outcomes. Panopticism maximizes efficiency and productivity by compelling conformity to a standard or norm. According to Michel Foucault, the outcome of panoptic surveillance is a society where everyone synchs to an “obligatory rhythm imposed from the outside” (151-2), such as the rhythmic divisions of the clock (150). In other words, panopticism transforms people into interchangeable cogs in an industrial machine. Methodologically, panopticism demands self-monitoring. Foucault emphasizes that panopticism functions most efficiently when the gaze is internalized, when one “assumes responsibility for the constraints of power” and “makes them play…upon himself” (202). Panopticism requires individuals to synchronize themselves with established compulsory patterns.
Acousmaticism, on the other hand, aims for dynamic attunement between subjects and institutions, an attunement that is monitored and maintained by a third party (in this example, the algorithm). For example, Facebook’s News Feed algorithm facilitates the mutual adaptation of norms to subjects and subjects to norms. Facebook doesn’t care what you like; instead it seeks to transform your online behavior into a form of efficient digital labor. In order to do this, Facebook must adjust, in part, to you. Methodologically, this dynamic attunement is not a practice of internalization, but unlike Foucault’s panopticon, big dataveillance leverages outsourcing and distribution. There is so much data that no one individual—indeed, no one computer—can process it efficiently and intelligibly. The work of dataveillance is distributed across populations, networks, and institutions, and the surveilled “subject” emerges from that work (for example, Rob Horning’s concept of the “data self”). Acousmaticism tunes into the rhythmic patterns that synch up with and amplify its cycles of social, political, and economic reproduction.
Unlike panopticism, which uses disciplinary techniques to eliminate noise, acousmaticism uses biopolitical techniques to allow profitable signals to emerge as clearly and frictionlessly as possible amid all the noise (for more on the relation between sound and biopolitics, see my previous SO! essay). Acousmaticism and panopticism are analytically discrete, yet applied in concert. For example, certain tiers of the North Carolina state employee’s health plan require so-called “obese” and tobacco-using members to commit to weight-loss and smoking-cessation programs. If these members are to remain eligible for their selected level of coverage, they must track and report their program-related activities (such as exercise). People who exhibit patterns of behavior that are statistically risky and unprofitable for the insurance company are subject to extra layers of surveillance and discipline. Here, acousmatic techniques regulate the distribution and intensity of panoptic surveillance. To use Nathan Jurgenson’s turn of phrase, acousmaticism determines “for whom” the panoptic gaze matters. To be clear, acousmaticism does not replace panopticism; my claim is more modest. Acousmaticism is an accurate and productive metaphor for theorizing both the aims and methods of big dataveillance, which is, itself, one instrument in today’s broader surveillance ensemble.
Featured image “Big Brother 13/365″ by Dennis Skley CC BY-ND.
Robin James is Associate Professor of Philosophy at UNC Charlotte. She is author of two books: Resilience & Melancholy: pop music, feminism, and neoliberalism will be published by Zer0 books this fall, and The Conjectural Body: gender, race and the philosophy of music was published by Lexington Books in 2010. Her work on feminism, race, contemporary continental philosophy, pop music, and sound studies has appeared in The New Inquiry, Hypatia, differences, Contemporary Aesthetics, and the Journal of Popular Music Studies. She is also a digital sound artist and musician. She blogs at its-her-factory.com and is a regular contributor to Cyborgology.
REWIND!…If you liked this post, check out: