Audio API

Generate text-to-speech audio from any text with natural language style control.

POST /audio/create

Create a new text-to-speech audio generation request.

Request Body

Parameter Type Required Description
text string Yes The text to convert to speech (10–100,000 characters)
voice string No Voice to use (see options below). Default: Kore
language string No Language code. Default: en-US
format string No mp3, opus, or wav. Default: mp3
style string No Natural language style instructions (e.g. "Speak warmly"). Max 4,000 chars.
model string No Model tier: basic, standard (default), or advanced
webhook_url string No URL to receive completion notification

Voice Options

30 TTS voices available.

Voice ID Character
Kore Kore (Smooth) (default)
Puck Puck (Bright)
Charon Charon (Informative)
Fenrir Fenrir (Energetic)
Zephyr Zephyr (Bright)
Aoede Aoede (Breezy)
Leda Leda (Easy-going)
Orus Orus (Smooth)
Achernar Achernar (Soft)
Achird Achird (Friendly)
Algenib Algenib (Gravelly)
Algieba Algieba (Smooth)
Alnilam Alnilam (Firm)
Autonoe Autonoe (Bright)
Callirrhoe Callirrhoe (Easy-going)
Despina Despina (Smooth)
Enceladus Enceladus (Breathy)
Erinome Erinome (Informative)
Gacrux Gacrux (Mature)
Iapetus Iapetus (Informative)
Laomedeia Laomedeia (Bright)
Pulcherrima Pulcherrima (Forward)
Rasalgethi Rasalgethi (Informative)
Sadachbia Sadachbia (Energetic)
Sadaltager Sadaltager (Informative)
Schedar Schedar (Even)
Sulafat Sulafat (Warm)
Umbriel Umbriel (Easy-going)
Vindemiatrix Vindemiatrix (Gentle)
Zubenelgenubi Zubenelgenubi (Casual)

Format Notes

Format Notes
mp3 Recommended. Supports all text lengths up to 100,000 characters.
opus OGG Opus format. Supports all text lengths up to 100,000 characters.
wav Uncompressed WAV. Supports all text lengths up to 100,000 characters. Note: large file sizes.

Example Request

curl -X POST https://api.genrex.io/v1/audio/create \
  -H "Authorization: Bearer grx_sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to GenRex. This platform helps you generate documents, images, and audio using AI.",
    "voice": "Kore",
    "format": "mp3",
    "style": "Speak warmly and conversationally"
  }'

Example Response

{
  "success": true,
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
GET /audio/status/{'{uuid}'}

Check audio generation status and retrieve the audio URL when completed.

Example Response (Completed)

{
  "success": true,
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "completed",
  "data": {
    "url": "https://genrex.io/audio/42/1708300800_f3a1b2c3.mp3",
    "duration": "1 min 12 sec",
    "character_count": 892,
    "format": "mp3"
  }
}

Audio in Documents

You can also generate audio narration as part of the document pipeline by setting audio.enabled to true when creating a document.

Additional Document Parameters

Parameter Type Description
audio.enabled boolean Enable audio narration. Default: false
audio.voice string TTS voice for narration. Default: Kore
audio.language string Language code. Default: en-US
audio.format string Audio format. Default: mp3
audio.style string Voice style instructions (e.g. "Speak warmly"). Max 4,000 chars.

Document Pipeline with Audio

When audio is enabled, the document pipeline adds an audio step after image generation:

pending → researching → generating → images → audio → completed