Robin Williams and the Shazbot Over the First Podcast
Histories of technology have politics. The way we discuss the emergence and development of media technologies implicates the priorities and interests of those telling the story, and how we understand a technology’s meaning and potential.
Among podcasters familiar with the history of the medium, Dave Winer– the developer behind the RSS feed –is usually credited as the progenitor of the form. This past summer, however, this narrative was challenged by Podnews editor James Cridland–(good naturedly, I presume)–who suggested that the comedian Robin Williams may actually have been the first podcaster, predating Winer’s RSS (“Rich Site Summary,” or “Really Simple Syndication”) distribution model by a few months. These origin stories have important technical differences that lead to political repercussions: the Winer narrative envisions podcasting as open and decentralized, and therefore theoretically an inherently emancipatory technology. The Williams narrative, in contrast, locates the birth of the medium within a closed, corporate-controlled platform – which just might mean there’s nothing inherrently open or democratic about internet-distributed audio content at all.
Though both perspectives are undoubtedly “great white man” visions of the medium’s history–or more precisely versions of Susan Douglas’s “inventor-hero”–what’s particularly interesting here is how both views implicate a politics of what podcasts are and what they ought to be. Although this quarrel was a dispute between colleagues that was ultimately abandoned, I argue it’s well worth a deeper examination, as the ideological conflict at its center isn’t just about the past, but rather competing visions of podcasting’s future – over the continued flourishing or gradual eclipse of RSS.
Indeed, debates over the technical definition of a podcast, and over who was—and who was not–the first podcaster based on that definition, reveal anxieties among long-time podcasters and developers about corporate consolidation in the industry as well as the apparent irrelevance of technical distinctions to listeners and creators who may not appreciate the way in which walled gardens negate the very thing that makes podcasting so special. Likewise, to suggest that podcasting may have first emerged as a proprietary form may retroactively justify corporate platform enclosures in the present. And, though I’m just as suspicious of corporate hegemony as the next person, nuancing the early history of the medium can help us think through the distinctions between technology and cultural form.
In the consensus version of podcasting’s history, the emergence of the medium is typically traced to software developer Dave Winer’s publication – with significant contribution from the former MTV VJ and Internet entrepreneur Adam Curry– of RSS (“Rich Site Summary,” or “Really Simple Syndication”) version 0.92 in December 2000, which allowed for the distribution of digital audio files. The first podcast feed followed in January 2001, and, with the launch of Curry’s iPodder podcast aggregator and his program Daily Source Code in 2004, podcasting began to coalesce as both technology and cultural form. In the 20-odd years since, the medium’s technical infrastructure has remained essentially unchanged: RSS continues to be the predominant format of podcast syndication.
So this past July, when Podnews editor James Cridland cheekily suggested that it was not Dave Winer, nor “the podfather” Adam Curry, but comedian Robin Williams who had actually been the world’s first podcaster, industry graybeards were quick to push back on his claim.
Cridland’s argument went like this: As an early investor in Audible.com, Williams launched a bi-weekly talk show called RobinWilliams@Audible in early 2000 (several months before Winer’s pioneering RSS), which listeners could download onto their mp3 players. Subscribers who owned an Audible Mobile Player could even have RobinWilliams@Audible automatically pushed to their device. “Of course, that’s what the first podcast was, too,” Cridland noted, “something you downloaded to your computer, then synched to your mp3 player.”
The crucial distinction, however, was that RobinWilliams@Audible was not distributed via RSS. For some, this meant that the show was definitively not a podcast – and Cridland’s claim patently absurd.
On The New Media Show, for instance, Todd Cochrane, founder-CEO of Blubrry, and Rob Greenlee, VP of Libsyn, spent nearly eighteen minutes on the subject, recounting the early history of online file sharing and concluding that a podcast could only be a podcast if it used RSS. For Audible to suggest that they had been the first in podcasting (Cridland’s post relied in part on Audible founder Don Katz as a source) was ego-driven revisionism.
On Twitter (an ancient social media app where people used to go to eviscerate each other), Cridland’s article provoked a squall of exceptions, which generally argued that downloadable audio without RSS does not a podcast make; and though Audible’s platform may have been innovative, and even shared some characteristics with podcasting, the fact that its programs were limited to the company’s proprietary platform meant that they were definitively not podcasts.
Rob Greenlee, for example, replied to Cridland’s article by clarifying that Audible was a precursor platform for RSS, but that its audio programs were definitively not podcasting. When Cridland pushed back, noting the automatic download feature on Audible, Greenlee’s co-host Todd Cochrane replied that this feature still did not make RobinWilliams@Audiblea podcast; and he insisted that he wasn’t going to budge on this point. A minor flap ensued, which ended with Cridland resignedly saying that he wished he had never written the article in the first place.In the end, even Dave Winer got involved, arguing that a piece of downloadable audio media had to have an RSS feed and be open to anyone, using any client, to qualify as a podcast.
To get a sense of the response to Cridland’s article on Twitter, and to let participants speak for themselves, I have selected a sampling of replies to Cridland’s original tweet teasing the article and reproduced them below. The conversation is arranged roughly in chronological order.
Admittedly, this was a very niche dispute – a handful of predominantly white tech dudes arguing over which white dude(s) had been the first podcaster. After a day or two, they all moved on.
But however minor (and however much Cridland may have wished he hadn’t written the article), the flap over RobinWilliams@Audible is a useful lens with which to understand contemporary debates over the future of podcasting: about whether the decentralized and open RSS-based ecosystem will long endure, or whether walled gardens—“limited set[s] of technology or media information provided to users with the intention of creating a monopoly or secured information system“—will prevail.
To better understand, however, let’s back up a bit.
By the fall of 2000, Dave Winer had earned a reputation as a pioneer of web syndication – he had been credited with launching the first blog – and someone who, according to the podcaster and author Eric Nuzum, “believed in making systems open, democratic, and easily accessible,” pushing back against the trend toward centralization and proprietary control of Internet infrastructures.
On a trip to New York that October, Winer met up with Adam Curry, who had been closely following his work. Over several hours in Curry’s hotel room, the entrepreneur attempted to convince Winer that web syndication technologies could be leveraged to distribute audio and video files – a vision of the Internet as “Everyman’s broadcast medium” – if only the so-called “last yard” problem of slow DSL connections could be resolved. By his own admission, Winer at first didn’t quite understand what Curry had in mind, but he was open experimenting with using RSS as “virtual bandwidth” that could deliver large media files during off-peak hours. In January 2001, Winer successfully used an RSS enclosure tag to distribute a single Grateful Dead song (it was U.S. Blues), inaugurating the first podcast feed – though what he had created wouldn’t become known as a “podcast” for some time.
Though interest in RSS-delivered audio files was slow to develop (indeed, even Winer and Curry pursued other projects for a time), “it was not lost on … early adopters,” as Andrew Bottomley has observed, adding “that the technology shifted power to the audience and also opened up opportunities for more democratized radio production” (111-112). The days of corporate gatekeepers exercising oligopolistic control over the production and distribution of audio content seemed numbered; no longer would broadcasting be subject to an economy of scarcity. Theoretically anyone with web hosting, a microphone, and an RSS feed could set themselves up in the radio business.
Since those early days, RSS has become “the currency of podcasting,” to borrow a phrase from Dave Jones, Adam Curry’s Podcasting 2.0 collaborator. Indeed, as Cridland himself wrote in his primer, “What is a Podcast?,” technically speaking, a “podcast” is comprised of an audio file, without DRM restrictions, that is available to download, and is “distributed via an RSS feed using an <enclosure> tag.”
But RSS is not without its detractors. Last July, for instance, Anchor.fm co-founder Michael Mignano argued that while technical standards like RSS (or HTTP, or SMTP, or SMS) provide a “common language” that allows for the rapid spread of new technologies, standardization inevitably stifles growth. “The tradeoff,” he wrote, “is that a lower barrier to entry means more products get created in a category, causing market fragmentation and ultimately, a slow pace of innovation.” The consequence of this “Standards Innovation Paradox” is that even as podcast listening apps proliferate, because they must conform to the RSS standard, the differences between them are superficial. Proprietary systems, Mignano argued, offer an alternative, allowing developers the flexibility to build – and rapidly improve – dynamic user experiences.
Naturally, Mignano pointed to Spotify – which acquired Anchor in 2019 – as an example of how closed systems could break the “curse” of standardization: When the company began to expand from music to other forms of audio content, he wrote, there was some speculation that the company would launch a dedicated podcast app. But, “if they had done so, they’d have to contend with the aforementioned ocean of podcast listening apps which were all offering users roughly the same features that were limited by the standard.” Instead, “Spotify used their existing music user base inside of the existing Spotify app to distribute podcasts to hundreds of millions of users.”
But this framing soft pedals Spotify’s aggressive attempts to steer podcasting away from RSS and toward platform enclosure. As John L. Sullivan argued in a 2019 paper, Spotify’s emphasis on exclusive releases (which has included the removal of content previously available via RSS, like The Joe Budden Podcast), and its $340 million acquisitions of Anchor and Gimlet are all part of an effort to control distribution and “maximize the ‘winner take all’ functions of platforms.” More recently, Anchor has stopped automatically generating an RSS feed at the time of publication, making it an opt-in function (meaning that creators have to know what RSS is to have their podcast distributed to directories otherthan Spotify). “We’ve been able to replace RSS for on-platform distribution,” noted one Spotify executive at a recent investor event, “which means that podcasts created on our platform are no longer held back by this outdated technology.”
Given the challenges that platform enclosure poses to RSS, its defenders’ insistence that “it’s not a podcast if it doesn’t have an RSS feed, and it’s not a podcast app if you can’t add your own RSS feeds,” as an episode title of Curry and Jones’s Podcasting 2.0 puts it, is understandable. Or, as Cochrane declared on The New Media Show, “until you tear my RSS feed through my dead hands, podcasts technically are podcasts that are delivered via RSS.”
And understandable, too, is the prickly reaction to Cridland’s alternate history: To claim that RobinWilliams@Audible may have been the first podcast is to suggest that RSS – and the open and democratic values which it represents – are inessential; and more troubling, that proprietary systems are deeply rooted in the history of the medium.
Of course, there’s also the sticky fact that RobinWilliams@Audible premiered before the word “podcast” entered the lexicon. But even this history is messy. In his original coinage, the technologist Ben Hammersley applied the term to a variety of different forms of downloadable audio media, including Audible originals like In Bed with Susie Bright. According to this early conception, in other words, podcasting described a cultural practice rather than a specific distribution infrastructure.
It is likely, too, that technological distinctions are irrelevant to listeners. Citing data from Edison Research showing that a significant percentage of listeners use Spotify and YouTube to access podcasts (even though content on these platforms don’t meet the strict technical definition of a “podcast”), Cridland has suggested that, for most people, podcasting is simply “on-demand audio. Like a radio show, but on-demand.”
Likewise, the question of whom the first podcaster was is of narrow interest. “Who cares?” an exasperated Cochrane finally concluded.
But reviewing the pre-2004 history of downloadable audio media can open up questions of the interpretive flexibility of technology (how technological artifacts come to have different meanings for different groups of users) and rhetorical closure (when the need for alternative designs diminish) that the late Trevor Pinch and Wiebe Bijker identified as key concepts in the Social Construction of Technology.
And so, rather than arguing about whether RobinWilliams@Audible – or, for that matter, Cochrane’s audio file sharing on FidoNet in the early 1990s – was the “first” podcast, further examination of this complex genealogy suggests the more interesting questions of how and why online distribution of audio files was such a desirable goal that there were severalpaths to its development.
The flap over Robin Williams and the question of the first podcaster also gives us much needed insight into current discourse about corporate influence in the podcasting space. Also It provided a way for proponents of the decentralized Podcasting 2.0 movement to make a technological distinction between a desire for freedom and a desire for control. While the scuffle itself was short-lived, its dust is far from settling.
Featured Image of Robin Williams (2008) by Flickr User Shameek (CC BY-NC-ND 2.0)
Andrew J. Salvati is an adjunct professor in the Media and Communications program at Drew University, where he teaches courses on podcasting and television studies. His research interests include media and cultural memory, television history, and mediated masculinity. He is the co-founder and occasional co-host of Inside the Box: The TV History Podcast, and Drew Archives in 10.
REWIND! . . .If you liked this post, you may also dig:
“I am Thinking Of Your Voice”: Gender, Audio Compression, and a Cyberfeminist Theory of Oppression: Robin James
DIY Histories: Podcasting the Past: Andrew Salvati
SO! Podcast #2: Behind the Podcast: Building Intimate Venues on the Internet – Andreas Duus Pape
Voice as Ecology: Voice Donation, Materiality, Identity
I first heard about voice donation while listening to “Being Siri,” an experimental audio piece about Erin Anderson donating her voice to Boston-based voice donation company, VocaliD. Like a digital blood bank of sorts, VocaliD provides a platform for donating one’s voice via digital audio recordings. These recordings are used to help technicians create a custom digital voice for a voiceless individual, providing an alternative to the predominately white, male, mechanical-sounding assistive technologies used by people who cannot vocalize for themselves (think Stephen Hawking). VocaliD manufactures voices that better match a person’s race, gender, ethnicity, age, and unique personality. To me, VocaliD encapsulates the promise, complexity, and problematic nature of our current speech AI landscape and serves as an example of why we need to think critically about sound technologies, even when they appear to be wholly beneficial.
Given the extreme lack of sonic diversity in vocal assistive technologies, VocaliD provides a critically important service. But a closer look at both the rhetoric used by the organization and the material process involved in voice donation also amplifies the limits of overly simplistic, human-centric conceptions of voice. For instance, VocaliD rhetorically frames their service by persistently linking voice to humanity—to self, authenticity, individuality. Consider the following statements made by Rupal Patel, CEO and founder of VocaliD, in which she emphasizes the need for voice donation technology:
“Here’s a way for us to acknowledge these individuals as unique human beings.” (Fast Company)
“I was talking to [a] girl we made a voice for. She told me that people are finally seeing her for who she really is.” (Medieros)
These are just a few examples from a larger discourse that reinforces the connection between voice and humanity. VocaliD’s repeated claims that their unique vocal identities humanize individuals imply that one is not fully human unless one’s voice sounds human. This rhetoric positions voiceless individuals as less than human (at least until they pay for a customized human-sounding voice).
VocaliD’s conflation of voice and humanity makes me wonder about the meaning of “human” in this context. For example, notions of humanity have been historically associated with Western whiteness—and deployed as a means of separating or distinguishing white people from Others—as Alexander Weheliye points out. Though VocaliD’s mission is to diversify manufactured voices, is a “human-sounding” voice still construed as a white voice? Does sounding human mean sounding white? Even if there is a bank of sonically diverse voices to choose from, does racial bias show up in the pacing, phrasing, or inflection caused by the vocal technology?
I am also disturbed by the rhetoric of humanity and individuality used by VocaliD because the company adopts the same rhetoric to describe the AI voices they sell to brands for media and smart products. Here’s an example of this rhetoric from the VocaliD AI website: “When you need a voice that resonates, evokes audience empathy, and sounds like you, rather than your competitors, VocaliD’s AI-powered vocal persona is the solution. Your voice — always on, where you need it when you need it.” Using similar rhetorical strategies to describe both voiceless people and products is dehumanizing. And yet, having a more diverse AI vocal mediascape, especially in terms of race, is crucially important since voice-activated machines and products are designed largely by white men who end up reinforcing the sonic color line.
Interestingly, the processes VocaliD uses to create a custom voice reveal that these voices are not, in fact, unique markers of humanity or individuality. It’s hard to find a detailed account of how VocaliD voices are made due to the company’s patents, but here are the basics: VocaliD does not transfer a donated voice directly to a voiceless person’s assistive technology. VocaliD technicians instead blend and digitally manipulate the donated voice with recordings of the noises a voiceless person can make (a laugh, a hum) to create a distinct new voice for the recipient. In other words, donated voices are skillful remixes that wouldn’t be possible without extracting vocal data and manipulating it with digital tools. Despite perpetuating narratives about voice, humanity, and authenticity, VocaliD’s creative blending of vocal material reveals that donated voices are the result of compositional processes that involve much more than people.
Further, considering VocaliD voices from a material rather than human-centric perspective amplifies something important about voices in general. All voices are composed of and grounded in an ecology. That is, voices emerge and are developed through a mixture of: (1) biological makeup (or technological makeup in the case of machines with voices); (2) specific environments and contexts (geography may determine the kind of accents humans have; AI voices have distinct sounds for their brands); (3) technologies (phones, computers, digital recorders and editors, software, and assistive technologies preserve, circulate, and amplify voices); and (4) others (humans often emulate the vocal patterns of the people they interact with most; many machine voices also sound like other machine voices). Put simply, all voices are intentionally and unintentionally composed over time—shaped by ever-changing bodily (and/or technological) states and engagements with the world. Voices are dynamic compositions by nature. Examining voice from a material standpoint shows that voices are not static markers of humanity; voices are responsive and malleable because they are the result of a complex ecology that involves much more than a “unique” human being.
However, focusing solely on the material aspects of vocality leaves out people’s lived experiences of voice. And based on online videos of VocaliD recipients—like Delaney, a seventeen-year-old with cerebral palsy—VocaliD voices seem to live up to the company’s hype. Delaney appears delighted by her new voice, stating: “I was so excited to get my own voice. I used to have a computer voice and now I sound like a girl. I like that. And I talk more.” Delaney’s teachers also discuss how her new voice completely changed her demeanor. Whereas before Delaney was reluctant to use her assistive technology to speak, her new voice gives her confidence and a stronger sense of identity. As her teacher explains in the video, “she is really engaged in groups, she wants to share her answers, she’s excited to talk with friends. It’s been really nice to see.” For Delaney, a VocaliD voice represents a newfound sense of agency.
It’s important to recognize this video is not necessarily representative of every VocaliD recipient’s experience, or even Delaney’s full experience. As Meryl Alper notes in Giving Voice, these types of news stories “portray technology as allowing individuals to ‘overcome’ their disability as an individual limitation, and are intended to be uplifting and inspirational for able-bodied audiences” (27). While we should be wary of the technological determinism in the video, observing Delaney use her VocaliD voice—and listening to the emotional responses of her mom and teachers—makes it difficult to deny that donated voices make a positive impact. For me, this video also gets at a larger truth about humans and voice: the ways we hear and understand our own voices, and the ways others interpret the sounds of our voices, matter a great deal. Voices are integral to our identities—to the ways we understand and think about ourselves and others—and the sounds of our voices have social and material consequences, as the SO! Gendered Voices Forum illustrates so clearly.
It’s worth repeating that VocaliD’s mission to diversify synthetic voices is incredibly important, especially given the restrictive vocal options available to voiceless individuals. It’s also necessary to acknowledge the company has limitations that end up reproducing the structural inequities it tries to address. As Alper observes, “In order to become a speech donor, one must have three to four hours of spare time to record their speech, access to a steady and strong Internet connection, and a quiet location in which to record” (162-63). With these obstacles to donating one’s voice in mind, it’s not surprising that all the VocaliD recipient videos I could find feature white people. Donating one’s voice is much easier for middle to upper class white people who have access to privacy, Internet, and leisure time.
This brief examination of VocaliD raises questions about what a more equitable future for vocal technologies might look/sound like. Though I don’t have the answer, I believe that to understand the fullness of voice, we can’t look at it from a single perspective. We need to account for the entire vocal ecology: the material (biological, technological, financial, etc.) conditions from which a voice emerges or is performed, and individual speakers’ understanding of their culture, race, ethnicity, gender, class, ability, sexuality, etc. An ecological approach to voice involves collaborating with people and their vocal needs and desires—something VocaliD models already. But it also involves accounting for material realities: How might we make the barriers preventing a more diverse voice ecosystem less difficult to navigate—especially for underrepresented groups? In short, we must treat voice holistically. Voices are more than people, more than technologies, more than contexts, more than sounds. Understanding voice means acknowledging the interconnectedness of these things and how that interconnectedness enables or precludes vocal possibilities.
Featured image: 366-350 You can’t shut me up, Jennifer Moo, CC BY-ND
Steph Ceraso is an associate professor of digital writing and rhetoric at the University of Virginia. Her 2018 book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, proposes an expansive approach to teaching with sound in the composition classroom. She also published a digital book in 2019 called Sound Never Tasted So Good: ‘Teaching’ Sensory Rhetorics—an exploration of writing, sound, rhetoric, and food. She is currently working on a book project that examines sonic forms of invention in various contexts.
REWIND! . . .If you liked this post, you may also dig:
What is a Voice?–Alexis Deighton MacIntyre
Mr. and Mrs. Talking Machine: The Euphonia, the Phonograph, and the Gendering of Nineteenth Century Mechanical Speech – J. Martin Vest
Only the Sound Itself?: Early Radio, Education, and Archives of “No-Sound”–Amanda Keeler