Voicebox by Meta vs Audiobox

Side-by-side comparison · Updated April 2026

	Voicebox by Meta	Audiobox
Description	Meta AI researchers have unveiled Voicebox, a cutting-edge generative AI model for speech that sets new standards in the field. Voicebox leverages a novel approach called Flow Matching to learn from raw audio and transcriptions, enabling it to modify any part of a given audio sample. It has outperformed existing models like VALL-E and YourTTS in terms of intelligibility, audio similarity, and processing speed. Voicebox has been trained on 50,000 hours of public domain audiobooks in multiple languages and can perform diverse tasks such as cross-lingual style transfer, noise removal, and content editing. Despite its capabilities, the model or code is not publicly accessible due to potential misuse, though Meta has shared audio samples and research papers detailing its functionalities.	Audiobox is Meta’s innovative foundation research model for audio generation. It enables users to generate voices and sound effects with ease by using voice inputs and natural language text prompts. Audiobox includes specialized models such as Audiobox Speech and Audiobox Sound, which are built upon the self-supervised Audiobox SSL model. It provides a platform for users to create custom audio for various applications. Interactive demos, Audiobox Maker, and research information are available to explore its capabilities further.
Category	Voice Modulation	Audio Editing
Rating	No reviews	No reviews
Pricing	N/A	N/A
Starting Price	N/A	N/A
Use Cases	Multilingual content creators Audiobook producers Podcasters Language learners	Content Creators Game Developers Educators Marketers
Tags	generative AI modelspeechFlow Matchingraw audiointelligibility	voicessound effectsvoice inputsnatural language text promptsaudio generation
Features
Generative AI for speech
Flow Matching technique
Zero-shot text-to-speech
Cross-lingual style transfer
Noise removal
Content editing
Multiple language support
State-of-the-art performance
50,000 hours of training data
Not publicly available due to ethical considerations
Generate voices and sound effects
Voice input and text prompt integration
Audiobox Speech for speech generation
Audiobox Sound for sound effects generation
Built on Audiobox SSL self-supervised model
Interactive demos available
Audiobox Maker for audio stories
Fairness and safety guardrails
Watermarked outputs for security
English language support
	View Voicebox by Meta	View Audiobox

Voicebox by Meta vs Audiobox

Modify This Comparison

Also Compare