In March, Spotify launched its first AI-powered feature with the debut of its AI DJ, an intelligent audio guide with a compellingly realistic voice. As it turned out, that AI character was based on a real person: Spotify’s head of cultural partnerships, Xavier “X” Jernigan, who was honored to become the first voice model for the AI feature.
TechCrunch sat down with Jernigan to find out more about the AI training process and Spotify’s future plans for its AI DJ efforts.
The new AI DJ personalizes the music listening experience for listeners, curating a selection of music based on their interests. He also has spoken commentary on each song, just like a real radio host.
In addition to Jernigan’s leading role on Spotify, he is also the host of several Spotify podcasts, including “The Window,” “Showstopper,” as well as the now-defunct “The Get Up” podcast. So, he is used to his voice being heard by millions of listeners. Still, having his voice memorialized from him as an AI is a unique experience.
Spotify chose Jernigan to be the first voice model because her “voice and personality have already resonated with a lot of our listeners,” Jernigan told TechCrunch. “(The company was) pretty sure I would resonate this way, too.”
Spotify’s morning show “The Get Up” garnered nearly 6 million listeners and was one of the top 10 podcasts on Spotify before it ended abruptly in 2022, proving Jernigan’s appeal.
Still, being the DJ voice model was hard to grasp at first, the podcast host admitted.
“They asked me to be this DJ voice model and it blew my mind when they explained it to me,” Jernigan told us. “Imagine if he’s hearing this for the first time, he has nothing to look at, and I’m like, ‘Wait, what? It will be me, but it’s not me, and it’s text and voice, but it will sound like me, and it’s AI?
“For me, it was a new experience to work with AI in this way. I was blown away,” she added.
Spotify says that its AI DJ was built using Sonantic and OpenAI technologies.
Sonantic is an AI startup that Spotify acquired last year. The company’s technology was responsible for creating realistic AI-based voices, including the one used for the voice of Val Kilmer in “Top Gun: Maverick.”
Before the acquisition, Spotify spent a few years researching AI-powered technology and worked on the DJ feature “in some iteration,” Jernigan said. He declined to share exactly how long the process took, but said the Sonantic technology integration “really got it going.”
Jernigan explained the process of training the AI, which involved walking into a studio, reading a script, and speaking in various cadences and inflections to convey different emotions. He gave the AI certain words that only he uses to make it feel as authentic as possible.
“We use the words that I say… I don’t say ‘melody’ for the songs. This is not how I talk,” she said. “I say, ‘hits’ or ‘bangers.’ So, you’ll hear DJ say those kinds of words,” Jernigan continued. “We even did a whole process of, how do I say ‘hello’, how do I say ‘hello’? He would carry a notebook and he would just write down these different phrases that were something that he would say.”
He added that the Spotify team made sure to keep their pauses and breaths natural so that the AI’s voice truly sounded like a human’s.
Even Jernigan’s mom gave her stamp of approval to the results.
“(DJ) passed the mom test. I played it to him before it came out, I explained it to him and I’m trying to make him understand it, ”she said. “She listened to all my podcasts, so she’s used to hearing my voice recorded and played back before and she said, ‘That sounds exactly like you.’ My mom said she sounded like me, so I knew she was right.”
Although realistic AI voices already exist, we’d say Spotify’s DJ is the calmest and most relaxed-sounding compared to others we’ve heard. Although Google’s Duplex technology may sound authentic, it’s not necessarily a pleasant voice to listen to when you’re trying to vibe to your summer playlist.
“For me, doing acting from a voice acting point of view, my goal was to connect with people and converse with people and think of a person. So when I was training the AI, I just envisioned a person when I was in the studio, talking to them and being their friend,” he added.
In addition to making the AI voice sound friendly to listeners, the design of the DJ himself was also made to feel accessible.
The animated green circle that users see when listening to the DJ is a nod to the Spotify logo and moves like a mouth when the AI speaks.
“When it came to design, we thought about the entire experience: how it works, how it sounds, how it looks, and how to personalize it for each user,” said Emily Galloway, Head of Product Design for Personalization at Spotify. TechCrunch. “Early on the visual side, we explored some options that seemed more technical (imagine things like sound waves). However, this didn’t feel right as we wanted to humanize the AI…”
“We wanted it to look and feel unique. In fact, he was so unique that he was granted a design patent,” Galloway added.
Jernigan contributed to DJing in other ways besides recording his voice.
For the AI to provide expert feedback on music, Spotify created a writers’ room made up of curators, cultural experts, and music experts.
Jernigan has extensive background in music, which is why he was also involved in the writers room. She previously worked for top artists like Diddy, Amy Winehouse, and 2 Chainz, among others.
And while Jernigan is the first DJ voice model, there’s a chance listeners will hear more voices in the future.
TechCrunch asked Jernigan if the company had any plans to hire voice models who speak other languages.
“Stay tuned,” he hinted.
AI DJ is currently only available in English to Premium subscribers in the US and Canada. As of February, the DJ feature is still in beta testing.
“We have a bunch of really cool new features across the board,” Jernigan said. “We have really cool stuff coming out.”