Voice cloning lets your agent speak in a specific voice instead of using a generic text-to-speech voice. You upload audio recordings of a speaker, Brainstormer trains a custom voice model from those samples, and the agent uses that voice to synthesize speech. Voice cloning is optional — agents work fine with the default TTS voice. You can add or change a voice at any time from the agent’s edit page.Documentation Index
Fetch the complete documentation index at: https://docs.brainstormer.io/llms.txt
Use this file to discover all available pages before exploring further.
Before you start
You need between 5 and 25 audio sample files. More samples generally produce a higher-quality clone. Supported formats: MP3, WAV, M4A, FLAC, AAC (maximum 10 MB per file) Best practices for recordings:- Each sample should be 30 seconds to 2 minutes long
- Use a consistent, clear recording environment — minimize echo and background noise
- Maintain consistent volume and microphone distance across samples
- Cover a range of sentence types: statements, questions, and different emotional registers
Upload audio samples
Open the voice setup
Navigate to the Agents page and open the agent you want to add a voice to. Click the Voice tab in the agent editor.If you are creating a new agent via Classic Setup, the voice upload interface is available in the Voice Setup step.Click Create Custom Voice to open the upload interface.
Name your voice
Enter a voice name — for example,
Customer Support Voice or Emma. This name identifies the voice in your agent settings.Optionally add a description to help you remember what the voice sounds like or where the recordings came from.Upload audio samples
Drag and drop your audio files onto the upload area, or click click to browse to open a file picker. You can upload multiple files at once.The interface shows each uploaded file with its name and size. Remove any file by clicking the × button next to it.Upload between 5 and 25 samples. The counter shows how many you have added out of the 25-sample maximum.
Adjust voice settings
Before submitting, configure the voice synthesis parameters:
The defaults (stability 75%, similarity 80%, style 20%, speaker boost on) work well for most voices. Adjust if the synthesized output sounds too flat or too inconsistent.
| Setting | What it controls |
|---|---|
| Stability | How consistent the voice sounds across outputs. Higher stability = more uniform delivery, less expressiveness. |
| Similarity boost | How closely the synthesized voice matches the original recordings. Higher values stay closer to the source voice. |
| Style | How much stylistic exaggeration is applied. Lower values sound more neutral; higher values sound more expressive. |
| Speaker boost | Enhances the clarity and presence of the voice. Recommended for most use cases. |
Create the voice
Click Create Voice. Your audio samples are uploaded and voice training begins.Training typically takes 10–30 minutes depending on the number of samples. You do not need to stay on the page — the voice will be ready when training completes.Once training finishes, the voice status changes from Training to Ready.
After the voice is ready
Once training is complete, the voice is automatically linked to the agent. The agent uses it whenever voice output is requested — in the dashboard chat interface and during API conversations. You can access the voice settings from the agent’s Voice tab at any time to adjust stability, similarity, style, or speaker boost without retraining.Changing voice settings does not require retraining. Only uploading new samples triggers a new training run.