Voice Cloning — Power and Ethics

ElevenLabs can clone a voice from 30 seconds of audio. That's useful for accessibility — and dangerous in the wrong hands. Here's how to use it well.

Builders · Creative AI · ~17 min read

Print / PDF

30 seconds of audio → a digital voice

ElevenLabs v3 (released late 2025, 'Alpha' tier) is the current gold standard. Feed it 30-60 seconds of clean audio of a voice and it produces a synthetic voice that can say anything in that voice, in 30+ languages, with realistic emotion — whispering, laughing, singing. OpenAI's voice models and 11ai agents are close competitors.

Legitimate uses

Accessibility — people losing their voice to ALS can bank a voice before it's gone (ElevenLabs partners with ALS Association).
Audiobook narration — authors voice their own books in multiple languages.
Podcast cleanup — fix flubbed words without re-recording.
Character voices for indie game developers.
Dubbing — translate your video into 30 languages in your own voice.

Abuses that are illegal and wrong

Cloning a family member to run a 'grandparent scam' call.
Cloning a CEO to authorize wire transfers (already caused multi-million-dollar losses in 2024-2025).
Fake political robocalls (felonies in many US states since 2024).
Cloning a person without consent for any purpose — harassment in 46 US states.

Using the ElevenLabs API

Python call to ElevenLabs v3 text-to-speech.

python

from elevenlabs.client import ElevenLabs from elevenlabs import play client = ElevenLabs(api_key="YOUR_KEY") # Use an existing voice (your own, cloned with consent) audio = client.text_to_speech.convert( voice_id="your_voice_id_here", model_id="eleven_v3", # v3 alpha, 2025 text="Hello, this is my voice reading a message I wrote.", voice_settings={ "stability": 0.5, "similarity_boost": 0.75, "style": 0.3, }, ) play(audio)

Spotting voice clones

1Robotic breathing patterns — real breath is irregular.
2Perfect consistency — real voices vary volume, pace, and inflection.
3Delays or hesitations when asked unexpected questions (if it's a live call).
4Background absolutely silent — real calls have room noise.

ElevenLabs v3 special features

Audio tags [laughs], [whispers], [sighs] — the model performs them.
Emotional direction (happy, anxious, serious) via voice_settings.style.
Multilingual — the same voice speaks 30+ languages natively.
11ai agents — voice + LLM + real-time conversation for customer service.

Compare the options

Provider	Voice quality	Voice cloning speed	Consent policy
ElevenLabs v3	Industry-leading; emotion + singing.	30 sec → clone in minutes.	Voice verification challenge required.
OpenAI Advanced Voice	Very natural, conversational.	Limited custom voices; presets.	No user cloning in prod API.
Cartesia / Sonic	Very fast (real-time); good quality.	Quick clone.	Consent required.
Open-source (XTTS, StyleTTS2)	Decent; runs locally.	Depends on compute.	Self-policed.

Key terms in this lesson

End-of-lesson quiz

Check what stuck

8 questions · Score saves to your progress.

Lesson help

Questions are best handled with a grown-up here.

For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Voice Cloning — Power and Ethics

30 seconds of audio → a digital voice

Legitimate uses

Abuses that are illegal and wrong

Using the ElevenLabs API

Spotting voice clones

ElevenLabs v3 special features

Questions are best handled with a grown-up here.

Keep going

Voice Cloning — Power and Ethics

30 seconds of audio → a digital voice

Legitimate uses

Abuses that are illegal and wrong

Using the ElevenLabs API

Spotting voice clones

ElevenLabs v3 special features

Questions are best handled with a grown-up here.

Keep going