Audio API

Generate text-to-speech audio from any text with natural language style control.

POST /audio/create

Create a new text-to-speech audio generation request.

Request Body

Parameter	Type	Required	Description
`text`	string	Yes	The text to convert to speech (10–100,000 characters)
`voice`	string	No	Voice to use (see options below). Default: `Kore`
`language`	string	No	Language code. Default: `en-US`
`format`	string	No	`mp3`, `opus`, or `wav`. Default: `mp3`
`style`	string	No	Natural language style instructions (e.g. "Speak warmly"). Max 4,000 chars.
`model`	string	No	Model tier: `basic`, `standard` (default), or `advanced`
`webhook_url`	string	No	URL to receive completion notification

Voice Options

30 TTS voices available.

Voice ID	Character
`Kore`	Kore (Smooth) (default)
`Puck`	Puck (Bright)
`Charon`	Charon (Informative)
`Fenrir`	Fenrir (Energetic)
`Zephyr`	Zephyr (Bright)
`Aoede`	Aoede (Breezy)
`Leda`	Leda (Easy-going)
`Orus`	Orus (Smooth)
`Achernar`	Achernar (Soft)
`Achird`	Achird (Friendly)
`Algenib`	Algenib (Gravelly)
`Algieba`	Algieba (Smooth)
`Alnilam`	Alnilam (Firm)
`Autonoe`	Autonoe (Bright)
`Callirrhoe`	Callirrhoe (Easy-going)
`Despina`	Despina (Smooth)
`Enceladus`	Enceladus (Breathy)
`Erinome`	Erinome (Informative)
`Gacrux`	Gacrux (Mature)
`Iapetus`	Iapetus (Informative)
`Laomedeia`	Laomedeia (Bright)
`Pulcherrima`	Pulcherrima (Forward)
`Rasalgethi`	Rasalgethi (Informative)
`Sadachbia`	Sadachbia (Energetic)
`Sadaltager`	Sadaltager (Informative)
`Schedar`	Schedar (Even)
`Sulafat`	Sulafat (Warm)
`Umbriel`	Umbriel (Easy-going)
`Vindemiatrix`	Vindemiatrix (Gentle)
`Zubenelgenubi`	Zubenelgenubi (Casual)

Format Notes

Format	Notes
`mp3`	Recommended. Supports all text lengths up to 100,000 characters.
`opus`	OGG Opus format. Supports all text lengths up to 100,000 characters.
`wav`	Uncompressed WAV. Supports all text lengths up to 100,000 characters. Note: large file sizes.

Example Request

curl -X POST https://api.genrex.io/v1/audio/create \
  -H "Authorization: Bearer grx_sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to GenRex. This platform helps you generate documents, images, and audio using AI.",
    "voice": "Kore",
    "format": "mp3",
    "style": "Speak warmly and conversationally"
  }'

Example Response

{
  "success": true,
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

GET /audio/status/{'{uuid}'}

Check audio generation status and retrieve the audio URL when completed.

Example Response (Completed)

{
  "success": true,
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "completed",
  "data": {
    "url": "https://genrex.io/audio/42/1708300800_f3a1b2c3.mp3",
    "duration": "1 min 12 sec",
    "character_count": 892,
    "format": "mp3"
  }
}

Audio in Documents

You can also generate audio narration as part of the document pipeline by setting audio.enabled to true when creating a document.

Additional Document Parameters

Parameter	Type	Description
`audio.enabled`	boolean	Enable audio narration. Default: `false`
`audio.voice`	string	TTS voice for narration. Default: `Kore`
`audio.language`	string	Language code. Default: `en-US`
`audio.format`	string	Audio format. Default: `mp3`
`audio.style`	string	Voice style instructions (e.g. "Speak warmly"). Max 4,000 chars.

Document Pipeline with Audio

When audio is enabled, the document pipeline adds an audio step after image generation:

pending → researching → generating → images → audio → completed