---
summary: "Create a voice from a natural-language description alone. No reference audio,\
  \ no consent recording, no cloned likeness \u2014 the engine generates a synthetic\
  \ speaker that fits your prompt."
title: Design a voice (no reference clip)
path: tutorials/design-a-voice
status: published
---

You can create a voice from text alone: describe the speaker in natural language and the engine generates a synthetic voice that matches. No reference clip, no consent recording, no risk of mimicking a real person. Useful when you want a fresh voice identity (brand mascots, IVR personalities, audiobook narrators that don't sound like anyone in particular).

This is the "Voice Design" mode. It contrasts with the cloned-voice path (see *Clone a voice and synthesise*) — same `/v1/speak` API surface afterwards, just a different intake.

## When to use Voice Design

- You want a voice that sounds **specific in character** (calm, warm, energetic, broadcaster-style) without picking one of the platform's preset voices.
- You **don't have rights** to a particular person's voice clip but want a custom identity.
- You're building a **brand voice** that should be platform-owned, not tied to a real speaker.
- You want to **iterate cheaply**: each new design takes seconds, no recording session needed.

If you have a real speaker whose voice you want to reproduce, use the cloned-voice path instead. Voice Design generates a *new* voice from the description; it doesn't try to match a person you've heard before.

## Prerequisites

- An API key with `scaispeak:voice.write` (tenant admins have it; otherwise grant explicitly).
- That's it. No audio recording, no consent capture, no legal review.

## 1. Write the description

The description shapes the voice — pace, tone, gender, accent, perceived age, emotional default. The engine reads it directly, so be specific about the qualities that matter to you.

Working examples:

- *"warm professional female narrator, calm pace, mid-Atlantic English accent"*
- *"energetic male radio host, fast cadence, slightly raspy"*
- *"older gentleman, deliberate pace, slight Scottish lilt, reading-grandfather warmth"*
- *"neutral broadcaster voice, no strong regional accent, clear and crisp"*

Skip the description-of-content. The prompt describes the *speaker*, not the *script*. The script comes later at synthesis time.

Minimum length is a handful of words. Single words like "calm" don't give the engine enough to work with — aim for at least one phrase describing pitch / pace / tone.

## 2. Create the voice

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/voices" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Acme Brand Voice" \
  -F "language_primary=en" \
  -F "language_supported_json=[\"en\"]" \
  -F "gender_hint=female" \
  -F "voice_design_prompt=warm professional female narrator, calm pace, mid-Atlantic English accent"
```

```python
import httpx, os

r = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaispeak/voices",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    data={
        "display_name": "Acme Brand Voice",
        "language_primary": "en",
        "language_supported_json": '["en"]',
        "gender_hint": "female",
        "voice_design_prompt": (
            "warm professional female narrator, "
            "calm pace, mid-Atlantic English accent"
        ),
    },
)
r.raise_for_status()
voice = r.json()["data"]
print(voice["voice_id"], voice["embedding_status"])
# → embedding_status: ready (immediately)
```

```javascript
const form = new FormData();
form.append("display_name", "Acme Brand Voice");
form.append("language_primary", "en");
form.append("language_supported_json", '["en"]');
form.append("gender_hint", "female");
form.append("voice_design_prompt",
  "warm professional female narrator, calm pace, mid-Atlantic English accent");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaispeak/voices`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
const { data: voice } = await res.json();
console.log(voice.voice_id, voice.embedding_status);
```

Notable differences from the cloned-voice path:

- **No `reference` file part.** Don't include one — sending both `voice_design_prompt` *and* `reference` is rejected with `SCAISPEAK_AMBIGUOUS_INTAKE_MODE`.
- **No `consent_*` fields.** Designed voices don't represent a real person, so there's no consent to capture.
- **No tokenisation wait.** The voice lands at `embedding_status: ready` in the same request — usable on the very next `/v1/speak` call.

## 3. Synthesise

Use it the same way as any other voice:

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "'$VOICE_ID'",
    "text": "Welcome to Acme. Your call is being routed to the next available agent.",
    "response_format": "wav"
  }' \
  | python -c "import sys,json,base64; \
    b=json.load(sys.stdin)['data']['audio_base64']; \
    open('greeting.wav','wb').write(base64.b64decode(b))"
```

Play `greeting.wav`. The voice will match the description — same character on every subsequent call, so a brand voice stays consistent across utterances.

## 4. Per-call delivery (optional)

The same per-call control fields work on designed voices — `instructions`, `cfg_value`, `warmup_trim_ms` from the clone tutorial all apply:

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "'$VOICE_ID'",
    "text": "Press one for sales, two for support.",
    "response_format": "wav",
    "instructions": "slow and reassuring"
  }'
```

The voice keeps its identity from the design prompt; the `instructions` field changes the per-call delivery (pace, emotion, emphasis).

## 5. Iterate on the design

If the voice doesn't match what you wanted, edit the design prompt via `PATCH /voices/{voice_id}`:

```bash
curl -X PATCH "$SCAIGRID_HOST/v1/modules/scaispeak/voices/$VOICE_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"voice_design_prompt":"warm female narrator, slightly slower pace, hint of British accent"}'
```

Every subsequent `/v1/speak` against the voice uses the new description. The voice ID stays stable, so any callers that store the ID don't need to change.

## 6. Switch a cloned voice to design-only

You can convert an existing cloned voice into a designed voice — drops the reference clip + consent record and replaces the identity with a text prompt. Useful when a tenant decides they no longer want a real person's voice on file:

```bash
curl -X PATCH "$SCAIGRID_HOST/v1/modules/scaispeak/voices/$VOICE_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_design_prompt": "warm professional voice, similar character to the previous reference",
    "clear_reference": true
  }'
```

`clear_reference: true` tombstones the reference WAV and consent recording from storage. Requires `voice_design_prompt` in the same patch — clearing the reference without a fallback identity is rejected (`SCAISPEAK_DESIGN_PROMPT_REQUIRED_ON_CLEAR`). The voice ID and `voice_id` references in your own systems stay valid.

## When you're done with it

Same erasure flow as cloned voices:

```bash
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaispeak/voices/$VOICE_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

For designed voices there's no consent recording to erase (none was captured), but the row + any stored synth outputs are tombstoned the same way. Audit row carries the proof.

## Done

You have a brand voice with no real-person likeness, no consent overhead, instantly usable, editable in place. The same `/v1/speak` API serves both designed and cloned voices — your callers don't need to know which kind they're talking to.
