Mark Zuckerberg is ramping up development of voice AI at Meta, with a focus on improving voice features in the upcoming Llama 4 release. The company plans for its AI to be based on natural conversations, enabling seamless dialogues between the user and the system. Meta, vying for leadership in AI, is also considering introducing paid subscriptions and advertising to commercialize the development of the technology.
The Financial Times sources say the company plans to introduce improved voice capabilities into its latest open-source language model, Llama 4, which is set to be released in the coming weeks. The company is betting on the so-called AI-powered agents to be based on conversation rather than text.
Meta focuses on voice dialogues instead of rigid answers
The company is focusing on making the conversation between the user and the voice model more natural and more like a two-way dialogue, allowing for interruptions from the user rather than just rigid questions and answers.
The move toward voice AI comes as Zuckerberg has outlined ambitious plans to make Meta a leader in AI, citing 2025 as a milestone for the company’s many AI-based products. Meta is racing against competitors like OpenAI, Microsoft and Google to commercialize the technology.
As such, the company is considering testing paid subscriptions for its AI assistant Meta AI, offering tasks related to agent-based features such as bookings or video creation, the Financial Times reported. The company is also considering introducing paid ads or sponsored posts to its AI assistant’s search results.
Zuckerberg has revealed plans to build an AI agent with programming and problem-solving skills at the level of an intermediate engineer, which he believes could create a “very large market.”
Meta declined to comment to the Financial Times.
Native speech instead of text translation
Group chief product officer Chris Cox outlined some of the plans for Llama 4 on March 5, saying it would be an “omni model” where speech would be “native… rather than translating voice to text, sending the text to LLM, receiving the text and converting it back to speech.”
Speaking at a Morgan Stanley technology conference, he added: “I believe it's a huge thing for the interface product, the idea that you can talk to the internet and just ask it anything. I think we still don't fully realize how powerful that is.”
The discussions come amid a wave of competing launches and warnings from newly appointed “AI czar” David Sacks, a Silicon Valley venture capitalist who has said he wants to make sure America’s AI models are not politically biased or “woke.”
Competition in the AI Voice Assistant Market
OpenAI introduced a voice mode last year with a focus on giving it distinct personalities, while Grok 3, created by Elon Musk's xAI and available on the X platform, introduced voice features for select users late last month.
The Grok model is specifically designed to have fewer safeguards, including a “no-holds-barred mode” that intentionally responds in a way that the company says is “controversial, inappropriate, and offensive.”
Meta last year unveiled a less “sanctimonious” version of its AI model in the third iteration of Llama, following criticism that Llama 2 refused to answer innocent questions.
Enabling users to interact with an AI assistant via voice commands is a key feature of Ray-Ban’s Meta smart glasses, which have recently become a big hit with consumers, as the company accelerates its plans to build lightweight headsets that could replace smartphones as consumers’ primary computing device.
JS