Developer API
Speech & Sound
Text-to-speech and sound effect generation with the Kolbo API.
Convert text to speech or generate sound effects using ElevenLabs and other providers.
List Voices
Discover available voices before generating speech. Returns both platform preset voices and your custom cloned/designed voices.
Endpoint
GET /api/v1/voicesQuery Parameters
| Parameter | Type | Description |
|---|---|---|
provider | string | Filter by provider (e.g., "elevenLabs", "google") |
language | string | Filter by language name or code (e.g., "English", "en-US") |
gender | string | Filter by gender (e.g., "Female", "Male") |
Example
curl https://api.kolbo.ai/api/v1/voices?gender=Female \
-H "X-API-Key: kolbo_live_..."Response
{
"success": true,
"voices": [
{
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"name": "Rachel",
"provider": "elevenLabs",
"language": "English",
"language_code": "en-US",
"gender": "Female",
"accent": "American",
"preview_url": "https://...",
"styles": ["conversational", "calm"],
"custom": false
},
{
"voice_id": "custom_abc123",
"name": "My Cloned Voice",
"provider": "elevenlabs",
"language": "auto",
"language_code": null,
"gender": null,
"accent": null,
"preview_url": null,
"styles": [],
"custom": true
}
],
"count": 152
}Use the voice_id from this response as the voice parameter in the speech endpoint. You can also pass a voice name (e.g., "Rachel") and the API will resolve it automatically.
Text to Speech
Endpoint
POST /api/v1/generate/speechRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to convert to speech |
voice | string | No | Voice ID or name (default: "Rachel") |
model | string | No | TTS model (default: "eleven_v3") |
language | string | No | Language code, e.g. "en-US", "he-IL" (default: "en-US") |
Example
curl -X POST https://api.kolbo.ai/api/v1/generate/speech \
-H "X-API-Key: kolbo_live_..." \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Kolbo AI, the all-in-one creative platform.",
"voice": "Rachel",
"language": "en-US"
}'Response
Generation Started
{
"success": true,
"generation_id": "tts123",
"type": "speech",
"model": "eleven_v3",
"credits_charged": 2,
"poll_url": "/api/v1/generate/tts123/status",
"poll_interval_hint": 3
}Completed Status
{
"success": true,
"generation_id": "tts123",
"state": "completed",
"progress": 100,
"result": {
"urls": ["https://cdn.kolbo.ai/audio/..."],
"model": "eleven_v3",
"voice": "Rachel",
"duration": 4.5
}
}Credits
Speech credits are character-based: ceil(text.length / 100) x model.credit
For example, a 250-character text with a model that costs 1 credit per 100 chars: ceil(250 / 100) x 1 = 3 credits.
Sound Effects
Endpoint
POST /api/v1/generate/soundRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Description of the sound effect |
duration | number | No | Duration in seconds (omit for auto) |
Example
curl -X POST https://api.kolbo.ai/api/v1/generate/sound \
-H "X-API-Key: kolbo_live_..." \
-H "Content-Type: application/json" \
-d '{"prompt": "Thunder clap followed by heavy rain"}'Response
Generation Started
{
"success": true,
"generation_id": "snd123",
"type": "sound",
"model": "auto",
"credits_charged": 5,
"poll_url": "/api/v1/generate/snd123/status",
"poll_interval_hint": 5
}Completed Status
{
"success": true,
"generation_id": "snd123",
"state": "completed",
"progress": 100,
"result": {
"urls": ["https://cdn.kolbo.ai/audio/..."],
"model": "elevenlabs-sound",
"duration": 8
}
}Tips
- Speech generation is fast (5-30 seconds)
- Sound effects typically take 5-30 seconds
- Both return audio URLs that can be downloaded or streamed