Audio API
Generate text-to-speech audio from any text with natural language style control.
POST
/audio/create
Create a new text-to-speech audio generation request.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | The text to convert to speech (10–100,000 characters) |
voice |
string | No | Voice to use (see options below). Default: Kore |
language |
string | No | Language code. Default: en-US |
format |
string | No | mp3, opus, or wav. Default: mp3 |
style |
string | No | Natural language style instructions (e.g. "Speak warmly"). Max 4,000 chars. |
model |
string | No | Model tier: basic, standard (default), or advanced |
webhook_url |
string | No | URL to receive completion notification |
Voice Options
30 TTS voices available.
| Voice ID | Character |
|---|---|
Kore |
Kore (Smooth) (default) |
Puck |
Puck (Bright) |
Charon |
Charon (Informative) |
Fenrir |
Fenrir (Energetic) |
Zephyr |
Zephyr (Bright) |
Aoede |
Aoede (Breezy) |
Leda |
Leda (Easy-going) |
Orus |
Orus (Smooth) |
Achernar |
Achernar (Soft) |
Achird |
Achird (Friendly) |
Algenib |
Algenib (Gravelly) |
Algieba |
Algieba (Smooth) |
Alnilam |
Alnilam (Firm) |
Autonoe |
Autonoe (Bright) |
Callirrhoe |
Callirrhoe (Easy-going) |
Despina |
Despina (Smooth) |
Enceladus |
Enceladus (Breathy) |
Erinome |
Erinome (Informative) |
Gacrux |
Gacrux (Mature) |
Iapetus |
Iapetus (Informative) |
Laomedeia |
Laomedeia (Bright) |
Pulcherrima |
Pulcherrima (Forward) |
Rasalgethi |
Rasalgethi (Informative) |
Sadachbia |
Sadachbia (Energetic) |
Sadaltager |
Sadaltager (Informative) |
Schedar |
Schedar (Even) |
Sulafat |
Sulafat (Warm) |
Umbriel |
Umbriel (Easy-going) |
Vindemiatrix |
Vindemiatrix (Gentle) |
Zubenelgenubi |
Zubenelgenubi (Casual) |
Format Notes
| Format | Notes |
|---|---|
mp3 |
Recommended. Supports all text lengths up to 100,000 characters. |
opus |
OGG Opus format. Supports all text lengths up to 100,000 characters. |
wav |
Uncompressed WAV. Supports all text lengths up to 100,000 characters. Note: large file sizes. |
Example Request
curl -X POST https://api.genrex.io/v1/audio/create \
-H "Authorization: Bearer grx_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to GenRex. This platform helps you generate documents, images, and audio using AI.",
"voice": "Kore",
"format": "mp3",
"style": "Speak warmly and conversationally"
}'
Example Response
{
"success": true,
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
GET
/audio/status/{'{uuid}'}
Check audio generation status and retrieve the audio URL when completed.
Example Response (Completed)
{
"success": true,
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "completed",
"data": {
"url": "https://genrex.io/audio/42/1708300800_f3a1b2c3.mp3",
"duration": "1 min 12 sec",
"character_count": 892,
"format": "mp3"
}
}
Audio in Documents
You can also generate audio narration as part of the document pipeline by setting audio.enabled to true when creating a document.
Additional Document Parameters
| Parameter | Type | Description |
|---|---|---|
audio.enabled |
boolean | Enable audio narration. Default: false |
audio.voice |
string | TTS voice for narration. Default: Kore |
audio.language |
string | Language code. Default: en-US |
audio.format |
string | Audio format. Default: mp3 |
audio.style |
string | Voice style instructions (e.g. "Speak warmly"). Max 4,000 chars. |
Document Pipeline with Audio
When audio is enabled, the document pipeline adds an audio step after image generation:
pending → researching → generating → images → audio → completed