← Back to all frameworks NLP

Whisper & TTS

Speech-to-text and natural voice — the heart of Vaaani (वाणी = voice)

What it is

Whisper is OpenAI's open-source ASR — 99-language transcription. Paired with modern TTS (ElevenLabs, OpenAI TTS, Coqui), I build voice agents that listen, think and speak in the user's language.

How Vaaani uses it

  • WhatsApp voice-note → transcript → AI reply → voice response
  • Real-time meeting transcription with speaker diarization
  • Multilingual customer support (Hindi, Bengali, English in one stream)
  • Voice-driven Android apps for low-literacy users

Why it makes the cut

The brand name Vaaani means 'speech.' Voice is core to the mission — and with Whisper + modern TTS, voice agents now sound human and cost cents per minute.

Sample code

import whisper

model = whisper.load_model("large-v3")
result = model.transcribe("customer-call.mp3",
                            language="hi")
print(result["text"])

Related in the Vaaani stack

Have a project that needs Whisper?

30-min discovery call. You describe the busywork; I map it to an AI worker and a budget.