Audio's Cambrian Moment: Why We Invest in Audio Technology

M13’s investment thesis is focused on the new tools for audio creation and consumption.

Last Updated: May 17, 2021

Published: October 14, 2020


After decades of video dominance, voice is the new killer app.

Voice is the new killer app,” AT&T’s head of technology recently commented about the renewed popularity of voice calls since the COVID-19 pandemic.

Following the outbreak, cellphone carriers reported daily call volumes double that of the busiest day of a typical year (Mother’s Day, naturally). The spike reflects a broader resurgence of audio and voice that shows every sign of sticking around after the pandemic is over.

After 70 years of visual dominance—dating back to when television overtook the radio—audio is making its comeback. Whether getting directions on a smartphone, shopping tips via smart speaker, or on-the-go inspiration from a podcast, consumers are integrating audio into their lives in a way we haven’t seen since the Golden Age of Radio.

While the pandemic catalyzed some of the gains, a mix of audio ecosystem development and new tools for audio creation and consumption are driving voice’s staying power. If you don’t believe me, check out the long lists of new audio apps to break onto the scene here and here.

Audio and voice is experiencing a Cambrian moment that parallels the rise of photo sharing, video, and blogging over the past 15 years, following the introduction of Facebook (2004), YouTube (2005), and Twitter (2006). Similar to how those platforms introduced new mediums to share content, new startups are introducing more audio-first experiences to consumers, and we believe we’re in the early stages of a platform migration from video to audio.

Audio ecosystem development

Fueled by innovation in smart speakers, speech recognition, earphones, and audio content, founders are now able to develop applications that showcase audio’s full potential. Consider these stats and their consequences:

  • One of every three American households has a smart speaker, according to Voicebot. This equates to nearly 90 million U.S. households, an increase of a staggering 85% since 2018. As a result, more consumers are getting used to issuing voice commands than ever before.
  • Apple could sell 90 million AirPods in 2020, more than a 150% increase from 2018, making it one of the company’s fastest-selling products ever. As a result, the voice recognition technology of AirPods and AirPods Pro means more consumers are able to experience truly hands-free mobile applications for the first time.
  • A quarter of U.S. adults listen to a podcast weekly, a 40% increase from 2018, according to Edison. A record 100 million Americans listen to a podcast on a monthly basis. These listeners are now able to choose from over 1 million podcasts, up 60% since 2018. As a result, more Americans are creating and consuming audio-native content than ever before.

Audio’s advantages

Audio and voice hold a number of advantages over visual media that has historically dominated the mobile market:

1) Audio is hands-free. Audio and voice platforms fit nicely into daily routines in which reading, viewing photos, or watching video is less than ideal (or in the case of driving and walking, potentially dangerous). Whether commuting, cardio-burning, cooking, or crocheting, audio allows users to communicate quickly and effectively while keeping their hands and eyes free for other activities.

2) Audio communicates emotion. The lack of emotion in text is what led Softbank (then called J-Phone) to introduce the emoji in 1997 (the original graphics included the iconic “pile of poo” that’s now standard on smartphone keyboards). The popularity of emojis reflects our desire to embed emotion in our written communication. Audio communication allows us to convey that emotion in a much more intimate and nuanced way using our tone, volume, and verbal expressions.

3) Audio gets us away from screens. That’s something more of us crave as we experience “Zoom fatigue” at work and seek breaks from the screens that command our attention for most of our waking hours. It’s estimated adults spend 11 hours everyday in front of a screen between work and home usage. If you need any more incentive to take a break from your screen, give the new Netflix documentary “The Social Dilemma” a shot.

4) Audio is efficient. As anyone with a “smart” remote knows, it’s much faster to search with our voice than our thumbs. Voice search is 3.7x faster than text: the average person can speak 110-150 words per minute but can only type 38-40 words per minute. This helps explain why 27% of the world’s online population is using voice search on mobile devices.

M13’s audio and voice investments

We began to develop our audio and voice thesis 18 months ago as we saw these trends in the ecosystem take shape. Since that time, we have met with dozens of entrepreneurs who are developing innovative tools, content, and platforms that showcase audio’s exciting potential.

We have backed three incredible founders and companies attacking different consumer use cases in the space. All companies remain in stealth mode and have very unique angles on the market. We’re eagerly looking forward to the full consumer launches. I’ll preview one of these companies below.

Content creation and discovery

Audio content creation and consumption are two areas one of our portfolio companies aims to improve. We believe there is ample opportunity for audio-first apps that help users discover, create, and share audio content.

While podcasting is poised to grow 30% this year and become a $1 billion market, it is highly concentrated on both the supply and demand side. On the supply side, there are over 1 million podcasts available, but 95% have less than 9,000 downloads, creating a long tail of content that is barely listened to. On the demand side, the audience driving this growth remains concentrated to “super fans” who can navigate legacy platforms and enjoy longer content. Most podcast listeners indicate they give up on a typical show within 15 minutes yet—the average show length remains 40 minutes, an eternity for young listeners. Also, most listeners discover new shows by reading the show description, rather than listening to a snippet.

Short-form audio and improved discoverability are the keys to unlocking the potential growth—two areas our portfolio company addresses head on.

The beta product, which remains in stealth, takes existing full-length content and condenses it into short clips that are served to users on a continuous flow, creating a frictionless discovery experience akin to TikTok’s interface. The company developed machine learning models to detect the most engaging segment of full-length audio content, serving to listeners the optimal 1- to 2- minute preview. The more time a listener spends on the platform, the better the app gets at surfacing content for them. When a user hears a snippet they find interesting, they simply give their Airpods a voice command to start playing the full episode.

As short-form consumption becomes ubiquitous on the platform, the company anticipates that creators and publishers (both independent and corporate) will create de novo short clips of audio content via the platform. This will enable faster consumption and easier creation of all audio content (i.e., more people can create short-form audio for their listeners, outside of the podcast medium only).

Stay tuned for public launches

We look forward to the public launches of all our stealth audio companies. Stay tuned for these exciting press releases in the coming months so you can start using them yourself. In the meantime, you can find a list of our full M13 portfolio here.