Voice cloning - Brainstormer

Voice cloning lets your agent speak in a specific voice instead of using a generic text-to-speech voice. You upload audio recordings of a speaker, Brainstormer trains a custom voice model from those samples, and the agent uses that voice to synthesize speech. Voice cloning is optional — agents work fine with the default TTS voice. You can add or change a voice at any time from the agent’s edit page.

Before you start

You need between 5 and 25 audio sample files. More samples generally produce a higher-quality clone. Supported formats: MP3, WAV, M4A, FLAC, AAC (maximum 10 MB per file) Best practices for recordings:

Each sample should be 30 seconds to 2 minutes long
Use a consistent, clear recording environment — minimize echo and background noise
Maintain consistent volume and microphone distance across samples
Cover a range of sentence types: statements, questions, and different emotional registers

Audio samples must be clear recordings of a single speaker. Mixed speakers, background music, or heavy compression significantly reduce clone quality.

Upload audio samples

Open the voice setup

Navigate to the Agents page and open the agent you want to add a voice to. Click the Voice tab in the agent editor.If you are creating a new agent via Classic Setup, the voice upload interface is available in the Voice Setup step.Click Create Custom Voice to open the upload interface.

Name your voice

Enter a voice name — for example, Customer Support Voice or Emma. This name identifies the voice in your agent settings.Optionally add a description to help you remember what the voice sounds like or where the recordings came from.

Upload audio samples

Drag and drop your audio files onto the upload area, or click click to browse to open a file picker. You can upload multiple files at once.The interface shows each uploaded file with its name and size. Remove any file by clicking the × button next to it.Upload between 5 and 25 samples. The counter shows how many you have added out of the 25-sample maximum.

Adjust voice settings

Before submitting, configure the voice synthesis parameters:

Setting	What it controls
Stability	How consistent the voice sounds across outputs. Higher stability = more uniform delivery, less expressiveness.
Similarity boost	How closely the synthesized voice matches the original recordings. Higher values stay closer to the source voice.
Style	How much stylistic exaggeration is applied. Lower values sound more neutral; higher values sound more expressive.
Speaker boost	Enhances the clarity and presence of the voice. Recommended for most use cases.

The defaults (stability 75%, similarity 80%, style 20%, speaker boost on) work well for most voices. Adjust if the synthesized output sounds too flat or too inconsistent.

Create the voice

Click Create Voice. Your audio samples are uploaded and voice training begins.Training typically takes 10–30 minutes depending on the number of samples. You do not need to stay on the page — the voice will be ready when training completes.Once training finishes, the voice status changes from Training to Ready.

Test the voice

When the voice is ready, click Test Voice to open the Voice Tester. Enter a sample phrase and click play to hear a real-time synthesis preview.If the output does not sound right, you can re-upload samples or adjust the voice settings and retrain.

After the voice is ready

Once training is complete, the voice is automatically linked to the agent. The agent uses it whenever voice output is requested — in the dashboard chat interface and during API conversations. You can access the voice settings from the agent’s Voice tab at any time to adjust stability, similarity, style, or speaker boost without retraining.

Changing voice settings does not require retraining. Only uploading new samples triggers a new training run.

Documentation Index

​Before you start

​Upload audio samples

​After the voice is ready

Before you start

Upload audio samples

After the voice is ready