← Back to all frameworks
NLP
Whisper & TTS
Speech-to-text and natural voice — the heart of Vaaani (वाणी = voice)
What it is
Whisper is OpenAI's open-source ASR — 99-language transcription. Paired with modern TTS (ElevenLabs, OpenAI TTS, Coqui), I build voice agents that listen, think and speak in the user's language.
How Vaaani uses it
- WhatsApp voice-note → transcript → AI reply → voice response
- Real-time meeting transcription with speaker diarization
- Multilingual customer support (Hindi, Bengali, English in one stream)
- Voice-driven Android apps for low-literacy users
Why it makes the cut
The brand name Vaaani means 'speech.' Voice is core to the mission — and with Whisper + modern TTS, voice agents now sound human and cost cents per minute.
Sample code
import whisper model = whisper.load_model("large-v3") result = model.transcribe("customer-call.mp3", language="hi") print(result["text"])
Related in the Vaaani stack
Have a project that needs Whisper?
30-min discovery call. You describe the busywork; I map it to an AI worker and a budget.