Kolbo.AIKolbo.AI Docs
Developer API

Speech & Sound

Text-to-speech and sound effect generation with the Kolbo API.

Convert text to speech or generate sound effects using ElevenLabs and other providers.

List Voices

Discover available voices before generating speech. Returns both platform preset voices and your custom cloned/designed voices.

Endpoint

GET /api/v1/voices

Query Parameters

ParameterTypeDescription
providerstringFilter by provider (e.g., "elevenLabs", "google")
languagestringFilter by language name or code (e.g., "English", "en-US")
genderstringFilter by gender (e.g., "Female", "Male")

Example

curl https://api.kolbo.ai/api/v1/voices?gender=Female \
  -H "X-API-Key: kolbo_live_..."

Response

{
  "success": true,
  "voices": [
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "name": "Rachel",
      "provider": "elevenLabs",
      "language": "English",
      "language_code": "en-US",
      "gender": "Female",
      "accent": "American",
      "preview_url": "https://...",
      "styles": ["conversational", "calm"],
      "custom": false
    },
    {
      "voice_id": "custom_abc123",
      "name": "My Cloned Voice",
      "provider": "elevenlabs",
      "language": "auto",
      "language_code": null,
      "gender": null,
      "accent": null,
      "preview_url": null,
      "styles": [],
      "custom": true
    }
  ],
  "count": 152
}

Use the voice_id from this response as the voice parameter in the speech endpoint. You can also pass a voice name (e.g., "Rachel") and the API will resolve it automatically.


Text to Speech

Endpoint

POST /api/v1/generate/speech

Request Body

FieldTypeRequiredDescription
textstringYesText to convert to speech
voicestringNoVoice ID or name (default: "Rachel")
modelstringNoTTS model (default: "eleven_v3")
languagestringNoLanguage code, e.g. "en-US", "he-IL" (default: "en-US")

Example

curl -X POST https://api.kolbo.ai/api/v1/generate/speech \
  -H "X-API-Key: kolbo_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Kolbo AI, the all-in-one creative platform.",
    "voice": "Rachel",
    "language": "en-US"
  }'

Response

Generation Started

{
  "success": true,
  "generation_id": "tts123",
  "type": "speech",
  "model": "eleven_v3",
  "credits_charged": 2,
  "poll_url": "/api/v1/generate/tts123/status",
  "poll_interval_hint": 3
}

Completed Status

{
  "success": true,
  "generation_id": "tts123",
  "state": "completed",
  "progress": 100,
  "result": {
    "urls": ["https://cdn.kolbo.ai/audio/..."],
    "model": "eleven_v3",
    "voice": "Rachel",
    "duration": 4.5
  }
}

Credits

Speech credits are character-based: ceil(text.length / 100) x model.credit

For example, a 250-character text with a model that costs 1 credit per 100 chars: ceil(250 / 100) x 1 = 3 credits.


Sound Effects

Endpoint

POST /api/v1/generate/sound

Request Body

FieldTypeRequiredDescription
promptstringYesDescription of the sound effect
durationnumberNoDuration in seconds (omit for auto)

Example

curl -X POST https://api.kolbo.ai/api/v1/generate/sound \
  -H "X-API-Key: kolbo_live_..." \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Thunder clap followed by heavy rain"}'

Response

Generation Started

{
  "success": true,
  "generation_id": "snd123",
  "type": "sound",
  "model": "auto",
  "credits_charged": 5,
  "poll_url": "/api/v1/generate/snd123/status",
  "poll_interval_hint": 5
}

Completed Status

{
  "success": true,
  "generation_id": "snd123",
  "state": "completed",
  "progress": 100,
  "result": {
    "urls": ["https://cdn.kolbo.ai/audio/..."],
    "model": "elevenlabs-sound",
    "duration": 8
  }
}

Tips

  • Speech generation is fast (5-30 seconds)
  • Sound effects typically take 5-30 seconds
  • Both return audio URLs that can be downloaded or streamed