Hold a hotkey. Talk. Release. The transcript is polished into structured text and pasted at your cursor in any app — Slack, VS Code, Notes, your browser, anywhere. Choose local Whisper for full privacy, or cloud backends for zero memory and WisprFlow-tier latency.
curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash
When you hold ⌥ Option, a floating pill appears with 16 bars responding to your voice:
Release the key — Whisper transcribes on-device or in the cloud, an LLM polishes the structure (lists, fillers, backtrack), and the text pastes at your cursor.
A voice dictation tool that respects your choices about privacy, latency, and where inference runs.
~0.7 s from key-release to pasted text with Groq's hosted Whisper + Llama. Faster than the original on most networks.
Ordinals become numbered lists. Fillers ("eh", "um", "o sea") get removed. Self-corrections ("actually 3 PM") get applied. The LLM only fires when triggers detect a payoff.
Run Whisper locally (MLX, 0 network) or remotely (Groq, OpenAI, Deepgram, Gemini, Anthropic — some shipped, more on the way). Polish LLM is independent.
Local STT keeps audio on-device. Cloud STT trades that for zero RAM and faster latency. Polish can run cloud-only on text. You pick the mix.
Every (raw → polished) pair is appended to ~/.susurro/polish.jsonl so you can inspect what changed. Local only, never sent anywhere.
No subscription. No telemetry. ~$0.06 / 1000 dictations if you use Groq (free tier covers normal use). $0 if you stay local.
Each stage of the pipeline (STT → polish) can use a different backend. Implemented now, more on the way.
| Stage | Backend | Memory | Latency (5 s clip) | Cost / 1k dictations | Status |
|---|---|---|---|---|---|
| STT | local (MLX Whisper) | ~3 GB | ~1.0 s | $0 | Shipped |
| STT | groq | 0 GB | ~0.15 s | $0.06 | Shipped · default |
| STT | openai gpt-4o-transcribe | 0 GB | ~0.6 s | $0.50 | v0.4 |
| STT | deepgram Nova-3 | 0 GB | ~0.2 s | $0.36 | v0.4 |
| Polish | off · raw STT | 0 | 0 ms | $0 | Shipped |
| Polish | rules · regex only | 0 | ~5 ms | $0 | Shipped |
| Polish | smart · Groq Llama 3.3 70B | 0 GB | ~0.3 s | $0.06 | Shipped · default |
| Polish | anthropic Claude Haiku 4.5 | 0 GB | ~0.5 s | $0.40 | v0.4 |
| Polish | gemini Flash | 0 GB | ~0.4 s | $0.01 | v0.4 |
macOS 13+ on Apple Silicon. Python 3.10+. ~3 minutes including dependencies.
curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash
The script installs Python via Homebrew if needed, sets up pipx, installs Susurro from GitHub, and prints the next-step instructions for setting your API key and granting macOS permissions.
pipx install git+https://github.com/danilobrando/susurro
If you already have pipx. Then susurro is on your PATH.
Get a free key at console.groq.com/keys. The free tier covers normal personal use. Add it to your shell rc:
echo 'export SUSURRO_GROQ_API_KEY="gsk_..."' >> ~/.zshrc && source ~/.zshrc
Then run susurro. The first hotkey press will prompt for Microphone, Accessibility, and Input Monitoring — grant each, then restart your terminal.
Same core UX (hotkey-driven dictation with smart formatting), but Susurro is free, MIT-licensed, and lets you choose where inference runs. WisprFlow is closed-source, paid, and cloud-only. Susurro also exposes the polish log locally so you can audit every edit.
Only if you choose a cloud STT backend. With STT_BACKEND="local", audio stays on-device — only the polished transcript (text) goes to the polish LLM if you set POLISH_BACKEND to a cloud provider. With both stages local, nothing leaves.
The daemon catches the error, switches to the local MLX backend for that and all future requests in the session, and posts a macOS notification telling you. You don't lose the transcription.
Local Whisper holds ~3 GB of RAM continuously. Most users prefer trading that for ~0.06 USD per 1000 dictations and faster latency. If you want full local, change one line in susurro/config.py: STT_BACKEND = "local".
For now, yes. The package depends on Apple's MLX framework even if you only use cloud backends. v0.4 will split that into an optional extra so Intel Macs can run cloud-only.
Not currently. The menu bar UI uses macOS-specific frameworks (rumps, AppKit, PyObjC) and the global hotkey + paste pipeline relies on macOS Accessibility APIs. Porting would mean rewriting roughly half the codebase.
Yes. Edit HOTKEY in susurro/config.py. Default is "alt_r" (right Option). Any pynput Key name works: "alt_l", "ctrl_r", "f19", etc.