In 1912, British physicist Edmund Fournier d’Albe built a device that he called the optophone, which converted light into tones. The first model—“the exploring optophone”—was meant to be a travel aid; it converted light into a sound of analogous intensity. A subsequent model, “the reading optophone,” scanned print using lamp-light separated into beams by a perforated disk. The pattern of light reflected back from a given character triggered a corresponding set of tones in a telephone receiver. d’Albe initially worked with 8 beams, producing 8 tones based on a diatonic scale. He settled on 5 notes: lower G, and then middle C, D, E and G. (Sol, do, re, mi, sol.) The optophone became known as a “musical print” machine. It was popularized by Mary Jameson, a blind student who achieved reading speeds of 60 words per minute.
In the field of media studies, the optophone has become renowned through its imaginary repurposings by a number of modernist artists. For one thing, the optophone finds brief mention in Finnegan’s Wake. In turn, Marshall McLuhan credited James Joyce’s novel for being a new medium, turning text into sound. In “New Media as Political Forms,” McLuhan says that Joyce’s own “optophone principle” releases us from “the metallic and rectilinear embrace of the printed page.” More familiar within media studies today, Dada artist Raoul Hausmann patented (London 1935), but did not successfully build, an optophone presumably inspired by d’Albe’s model, which he hoped would be employed in audiovisual performances. This optophone was meant to convert sound into light as well as the reverse. It was part of a broader contemporary impulse to produce color music and synaesthetic art. Hausmann also wrote optophonetic poetry, based on the sounds and rhythms of “pure phonemes” and non-linguistic noises. In response, Francis Picabia painted two optophone portraits in 1921 and 22. Optophone I, below, is composed of lines that might be sound waves, with a pattern that disorders vision.
Theorists have repeatedly located Hausmann’s device at the origin of new media. Authors in the Audiovisuology, Media Archaeology, and Beyond Art: A Third Culture anthologies credit Hausmann’s optophone with bringing-into-being cybernetics, digitization, the CD-ROM, audiovisual experiments in video art, and “primitive computers.” It seems to have escaped notice that d’Albe also used the optophone to create electrical music. In his book, The Moon Element, he writes:
d’Albe’s device is typically portrayed as a historical cul-de-sac, with few users and no real technical influence. Yet optophones continued to be designed for blind people throughout the twentieth century; at least one model has users even today. Musical print machines, or “direct translators,” co-existed with more complex OCR-devices—optical character recognizers that converted printed words into synthetic speech. Both types of reading machine contributed to today’s procedures for scanning and document digitization. Arguably, reading optophones intervened more profoundly into the order of print than did Hausmann’s synaesthetic machine: they not only translated between the senses, they introduced a new symbolic system by which to read. Like braille, later vibrating models proposed that the skin could also read.
In December 1922, the Optophone was brought to the United States from the United Kingdom for a demonstration before a number of educators who worked with blind children; only two schools ordered the device. Reading machine development accelerated in the U.S. around World War II. In his position as chair of the National Defense Research Committee, Vannevar Bush established a Committee on Sensory Devices in 1944, largely for the purpose of rehabilitating blind soldiers. The other options for reading—braille and Talking Books—were relatively scarce and had a high cost of production. Reading machines promised to give blind readers access to magazines and ephemeral print (recipes, signs, mail), which was arguably more important than access to books.
At RCA (Radio Corporation of America), the television innovator Vladimir Zworykin became involved with this project. Zworykin had visited Fournier d’Albe in London in the 19-teens and seen a demonstration of the optophone. Working with Les Flory and Winthrop Pike, Zworykin built an initial machine known as the A-2 that operated on the same principles, but used a different mechanism for scanning—an electric stylus, which was publicized as “the first pen that reads.” Following the trail of citations for RCA’s “Reading Aid for the Blind” patent (US 2420716A, filed 1944), it is clear that the “pen” became an aid in domains far afield from blindness. It was repurposed as an optical probe for measuring the oxygen content of blood (1958); an “optical system for facsimile scanners” (1972); and, in a patent awarded to Burroughs Corporation in 1964, a light gun. This gun, in turn, found its way into the handheld controls for the first home video game system, produced by Sanders Associates.
The A-2 optophone was tested on three blind research subjects, including ham radio enthusiast Joe Piechowski, who was more of a technical collaborator. According to the reports RCA submitted to the CSD, these readers were able to correlate the “chirping” or “tweeting” sounds of the machine with letters “at random with about eighty percent accuracy” after 60 hours of practice. Close spacing on a printed page made it difficult to differentiate between letters; readers also had difficulty moving the stylus at a steady pace and in a straight line. Piechowski achieved reading speeds of 20 words per minute, which RCA deemed too slow.
Attempts were made to incorporate “human factors” and create a more efficient tonal code, to reduce reading time as well as learning time and confusion between letters. One alternate auditory display was known as the compressed optophone. Rather than generate multiple tones or chords for a single printed letter, which was highly redundant and confusing to the ear, the compressed version identified only certain features of a printed letter: such as the presence of an ascender or descender. Below is a comparison between the tones of the original optophone and the compressed version, recorded by physicist Patrick Nye in 1965. The following eight lower case letters make up the source material: f, i, k, j, p, q, r, z.
Original record in the author’s possession. With thanks to Elaine Nye, who generously tracked down two of her personal copies at the author’s request. The second copy is now held at Haskins Laboratories.
Because of the seeming limitations of tonal reading, RCA engineers re-directed their research to add character recognition to the scanning process. This was controversial, direct translators like the optophone being perceived as too difficult because they required blind people to do something akin to learning to read print—learning a symbolic tonal or tactile code. At an earlier moment, braille had been critiqued on similar grounds; many in the blind community have argued that mainstream anxieties about braille sprang from its symbolic difference. Speed, moreover, is relative. Reading machine users protested that direct translators like the optophone were inexpensive to build and already available—why wait for the refinement of OCR and synthetic speech? Nevertheless, between November 1946 and May 1947, Zworykin, Flory, and Pike worked on a prototype “letter reading machine,” today widely considered to be the first successful example of optical character recognition (OCR). Before reliable synthetic speech, this device spelled out words letter by letter using tape recordings. The Letter-Reader was too massive and expensive for personal use, however. It also had an operating speed of 20 words per minute—thus it was hardly an improvement over the A-2 translator.
Haskins Laboratories, another affiliate of the Committee on Sensory Devices, began working on the reading machine problem around the same time, ultimately completing an enormous amount of research into synthetic speech and—as argued by Donald Shankweiler and Carol Fowler—the “speech code” itself. In the 1940s, before workable text-to-speech, researchers at Haskins wanted to determine whether tones or artificial phonemes (“speech-like speech”) were easier to read by ear. They developed a “machine dialect of English,” named wuhzi: “a transliteration of written English which preserved the phonetic patterns of the words.” An example can be played below. The eight source words are: With, Will, Were, From, Been, Have, This, That.
Original record in the author’s possession. From Patrick Nye, “An Investigation of Audio Outputs for a Reading Machine” (1965). With thanks to Elaine Nye.
Based on the results of tests with several human subjects, the Haskins researchers concluded that aural reading via speech-like sounds was necessarily faster than reading musical tones. Like the RCA engineers, they felt that a requirement of these machines should be a fast rate of reading. Minimally, they felt that reading speed should keep pace with rapid speech, at about 200 words per minute.
Funded by the Veterans Administration, members of Mauch Laboratories in Ohio worked on both musical optophones and spelled-speech recognition machines from the 1950s into the 1970s. One of their many devices, the Visotactor, was a direct-translator with vibro-tactile output for four fingers. Another, the Visotoner, was a portable nine-channel optophone. All of the Mauch machines were tested by Harvey Lauer, a technology transfer specialist for the Veterans Administration for over thirty years, himself blind. Below is an excerpt from a Visotoner demonstration, recorded by Lauer in 1971.
Visotoner demonstration. Original 7” open reel tape in author’s possession. With thanks to Harvey Lauer for sharing items from his impressive collection and for collaborating with the author over many years.
Later on the same tape, Lauer discusses using the Visotoner to read mail, identify currency, check over his own typing, and read printed charts or graphics. He achieved reading speeds of 40 words per minute with the device. Lauer has also told me that he prefers the sound of the Visotoner to that of other optophone models—he compares its sound to Debussy, or the music for dream sequences in films.
Mauch also developed a spelled speech OCR machine called the Cognodictor, which was similar to the RCA model but made use of synthetic speech. In the recording below, Lauer demonstrates this device by reading a print-out about IBM fonts. He simultaneously reads the document with the Visotoner, which reveals glitches in the Cognodictor’s spelling.
Original 7” open reel tape in the author’s possession. With thanks to Harvey Lauer.
In 1972, at the request of Lauer and other blind reading machine users, Mauch assembled a stereo-optophone with ten channels, called the Stereotoner. This device was distributed through the VA but never marketed, and most of the documentation exists in audio format, specifically in sets of training tapes that were made for blinded veterans who were the test subjects. Some promotional materials, such as the short video below, were recorded for sighted audiences—presumably teachers, rehabilitation specialists, or funding agencies.
Video courtesy of Harvey Lauer.
Mary Jameson corresponded with Lauer about the stereotoner, via tape and braille, in the 1970s. In the braille letter pictured below she comments, “I think that stereotoner signals are the clearest I have heard.”
In 1973, with the marketing of the Kurzweil Reader, funding for direct translation optophones ceased. The Kurzweil Reader was advertised as the first machine capable of multi-font OCR; it was made up of a digital computer and flatbed scanner and it could recognize a relatively large number of typefaces. Kurzweil recalls in his book The Age of Spiritual Machines that this technology quickly transferred to Lexis-Nexis as a way to retrieve information from scanned documents. As Lauer explained to me, the abandonment of optophones was a serious problem for people with print disabilities: the Kurzweil Readers were expensive ($10,000-$50,000 each); early models were not portable and were mostly purchased by libraries. Despite being advertised as omnifont readers, they could not in fact recognize most printed material. The very fact of captchas speaks to the continued failures of perfect character recognition by machines. And, as the “familiarization tapes” distributed to blind readers indicate, the early synthetic speech interface was not transparent—training was required to use the Kurzweil machines.
Original cassette in the author’s possession.
Lauer always felt that the ideal reading machine should have both talking OCR and direct-translation capabilities, the latter being used to get a sense of the non-text items on a printed page, or to “preview material and read unusual and degraded print.” Yet the long history of the optophone demonstrates that certain styles of decoding have been more easily naturalized than others—and symbols have increasingly been favored if they bear a close relation to conventional print or speech. Finally, as computers became widely available, the focus for blind readers shifted, as Lauer puts it, “from reading print to gaining access to computers.” Today, many electronic documents continue to be produced without OCR, and thus cannot be translated by screen readers; graphical displays and videos are largely inaccessible; and portable scanners are far from universal, leaving most “ephemeral” print still unreadable.
Mara Mills is an Assistant Professor of Media, Culture, and Communication at New York University, working at the intersection of disability studies and media studies. She is currently completing a book titled On the Phone: Deafness and Communication Engineering. Articles from this project can be found in Social Text, differences, the IEEE Annals of the History of Computing, and The Oxford Handbook of Sound Studies. Her second book project, Print Disability and New Reading Formats, examines the reformatting of print over the course of the past century by blind and other print disabled readers, with a focus on Talking Books and electronic reading machines. This research is supported by NSF Award #1354297.
Editor’s Note: Welcome to Sounding Out!‘s fall forum titled “Sound and Play,” where we ask how sound studies, as a discipline, can help us to think through several canonical perspectives on play. While Johan Huizinga had once argued that play is the primeval foundation from which all culture has sprung, it is important to ask where sound fits into this construction of culture; does it too have the potential to liberate or re-entrench our social worlds? SO!’s new regular contributor Enongo Lumumba-Kasongo notes how audio games, like Papa Sangre, often use sound as a gimmick to engage players, and considers the politics of this feint. For whom are audio games immersive, and how does the experience serve to further marginalize certain people or disadvantaged groups?–AT
Immersion is a problem at the heart of sound studies. As Frances Dyson (2009) suggests in Sounding New Media, “Sound is the immersive medium par excellence. Three dimensional, interactive and synesthetic, perceived in the here and now of an embodied space, sound returns to the listener the very same qualities that media mediates…Sound surrounds” (4). Alternately, in the context of games studies (a field that is increasingly engaged with sound studies), issues of sound and immersion have most recently been addressed in terms of instrumental potentialities, historical developments, and technical constraints. Some notable examples include Sander Huiberts’ (2010) M.A. thesis entitled “Captivating Sound: The Role of Audio Immersion for Computer Games,” in which he details technical and philosophical frames of immersion as they relate to the audio of a variety of computer games, and an article by Aaron Oldenburg (2013) entitled “Sonic Mechanics: Audio as Gameplay,” in which he situates the immersive aspects of audio-gameplay within contemporaneous experimental art movements. This research provokes the question: How do those who develop these games construct the idea of immersion through game design and what does this mean for users who challenge this construct? Specifically I would like to challenge Dyson’s claim that sound really is “the immersive medium par excellence” by considering how the concept of immersion in audio-based gameplay can be tied to privileged notions of character and game development.
In order to investigate this problem, I decided to play an audio game and document my daily experiences on a WordPress blog. Based on its simulation of 3D audio Papa Sangre was the first game that came to mind. I also selected the game because of its accessibility; unlike the audio game Deep Sea, which is celebrated for its immersive capacities but is only playable by request at The Museum of Art and Digital Entertainment, Papa Sangre is purchasable as an app for $2.99 and can be played on an iPhone, iPad or iPod. Papa Sangre helps us to consider new possibilities for what is meant by virtual space and it serves as a useful tool for pushing back against essentialisms of “immersion” when talking sound and virtual space.
Papa Sangre is comprised of 25 levels, the completion of which leads player incrementally closer towards the palace of Papa Sangre, a man who has kidnapped a close friend of the protagonist. The game boasts real time binaural audio, meaning that the game’s diegetic sounds (sounds that the character in the game world can “hear”) pan across the player’s headphones in relation to the movement of the game’s protagonist. The objective of each level is to locate and collect musical notes that are scattered through the game’s many topographies while avoiding any number of enemies and obstacles, of course.
A commercial success, Papa Sangre has been named “Game of the Week” by Apple, received a 9/10 rating from IGN, a top review from 148apps, and many positive reviews from fans. Gamezebo concludes an extremely positive review of Papa Sangre by calling it “a completely unique experience. It’s tense and horrifying and never lets you relax. By focusing on one aspect of the game so thoroughly, the developers have managed to create something that does one thing really, really well…Just make sure to play with the lights on.” This commercial attention has yielded academic feedback as well. In a paper entitled “Towards an analysis of Papa Sangre, an audio-only game for the iPhone/iPad,” Andrew Hugill (2012) celebrates games like Papa Sangre for providing “an excellent opportunity for the development of a new framework for electroacoustic music analysis.” Despite such attention–and perhaps because of it–I argue that Papa Sangre deserves a critical second listen.
Between February and April of 2012, I played Papa Sangre several times a day and detailed the auditory environments of the game in my blog posts. However, by the time I reached the final level, I still wasn’t sure how to answer my initial question. Had Papa Sangre really engendered a novel experience or it could simply be thought of as a video game with no video? I noted in my final post:
I am realizing that what makes the audio gaming experience seem so different from the experience of playing video games is the perception that the virtual space, the game itself, only exists through me. The “space” filled by the levels and characters within the game only exists between my ears after it is projected through the headphones and then I extend this world through my limbs to my extremities, which feeds back into the game through the touch screen interface, moving in a loop like an electric current…Headphones are truly a necessity in order to beat the game, and in putting them on, the user becomes the engine through which the game comes to life…When I play video games, even the ones that utilize a first-person perspective, I feel like the game space exists outside of me, or rather ahead of me, and it is through the controller that I am able to project my limbs forward into the game world, which in turn structures how I orient my body. Video game spaces of course, do not exist outside of me, as I need my eyes and ears to interpret the light waves and sound waves that travel back from the screen, but I suppose what matters here is not what is actually happening, but how what is happening is perceived by the user. Audio games have the potential to engender completely different gaming experiences because they make the user feel like he or she is the platform through which the game-space is actualized.
Upon further reflection, however, I recognize that Papa Sangre creates an environment designed to be immersive only to certain kinds of users. A close reading of Papa Sangre reveals bias against both female and disabled players.
Take Papa Sangre’s problematic relationship with blindness. The protagonist is not a visually impaired individual operating in a horrifying new world, but rather a sighted individual who is thrust into a world that is horrifying by virtue of its darkness. The first level of the game is simply entitled “In the Dark.” When the female guide first appears to the protagonist in that same level, she states:
Here you are in the land of the dead, the realm ruled by Papa Sangre…In this underworld it is pitch dark. You cannot see a thing; you can’t even see me, a fluttery watery thing here to help you. But you can listen and move…You must learn how to see with your ears. You will need these powers to save the soul in peril and make your way to the light.
Note the conversation between 3:19 and 3:56.
The game envisions an audience who find blindness to be necessarily terrifying. By equating an inability to see with death and fear, developers are intensifying popular horror genre tropes that diminish the lived experiences of those with visual impairments and unquestioningly present blindness as a problem to overcome. Rather than challenging the relationship between blindness and vulnerability that horror-game developers fetishize, Papa Sangre misses the opportunity to present a visually impaired protagonist who is not crippled by his or her disability.
Disconcertingly, audio games have been tied to game accessibility efforts by developers and players alike for many years. In a 2008 interview Kenji Eno, founder of WARP (a company that specialized in audio games in the late 90s), claimed his interactions with visually impaired gamers yielded a desire to produce audio games. Similarly forums like audiogames.net showcase users and developers interested in games that cater to gamers with impaired vision.
In terms of its actual game-play, PapaSangre is navigable without visual cues. After playing the game for just two weeks I was able to explore each level with my eyes closed. Still, the ease with which gamers can play the game without looking at the screen does not negate the tension caused by recycled depictions of disability that are in many ways built into storyline’s foundation.
The game also fails to engage gender in any complexity. Although the main character’s appearance is never shown, the protagonist is aurally gendered male. Most notable are the deep grunting noises made when he falls to the ground. For me, this acted as a barrier to imagining a fully embodied virtual experience. Those deep grunts revealed many assumptions the designers must have considered about the imagined and perhaps intended audience of the game. While lack of diversity is certainly an issue at the heart of all entertainment media, Papa Sangre‘s oversight directly contradicts the message of the game, wherein the putative goal is to experience an environment that enhances one’s sense of self within the virtual space.
On October 31st, 2013, Somethin’ Else will release Papa Sangre II. A quick look at the trailer suggests that the developers’ have not changed the formula. The 46-second clip warns that the game is “powered by your fear” after noting, “This Halloween, you are dead.”
It appears that an inability to see is still deeply connected with notions of fear and death in the game’s sequel. This does not have to be the case. Why not design a game where impairment is not framed as a hindrance or source of fear? Why not build a game with the option to choose between different sounding voice actors and actresses? Despite its popularity, however, Papa Sangre is by no means representative of general trends across the spectrum of audio-based game design. Oldenburg (2013) points out that over the past decade many independent game developers have been designing experimental “blind games” that eschew themes and representations found in popular video games in favor of the abstract relationships between diegetic sound and in-game movement.
Whether or not they eventually consider the social politics of gaming, Papa Sangre’s developers already send a clear message to all gamers by hardwiring disability and gender into both versions of the game while promoting a limited image of “immersion.” Hopefully as game designers Somethin’ Else grow in popularity and prestige, future developers that use the “Papa Engine” will be more cognizant of the privilege and discrimination embedded in the sonic cues of its framework. Until then, if you are not a sighted male gamer, you must prepare yourself to be immersed in constant aural cues that this experience, like so many others, was not designed with you in mind.
Enongo Lumumba-Kasongo is a PhD student in the Department of Science and Technology Studies at Cornell University. Since completing a senior thesis on digital music software, tacit knowledge, and gender under the guidance of Trevor Pinch, she has become interested in pursuing research in the emergent field of sound studies. She hopes to combine her passion for music with her academic interests in technological systems, bodies, politics and practices that construct and are constructed by sound. More specifically she would like to examine the politics surrounding low-income community studios, as well as the uses of sound in (or as) electronic games. In her free time she produces hip hop beats and raps under the moniker Sammus (based on the video game character, Samus Aran, from the popular Metroid franchise).
REWIND! . . .If you liked this post, you may also dig:
Goalball: Sport, Silence, and Spectatorship— Melissa Helquist