v0.2.0 · Free · MIT licensed

Voice dictation for macOS, your way.

Hold a hotkey. Talk. Release. The transcript is polished into structured text and pasted at your cursor in any app — Slack, VS Code, Notes, your browser, anywhere. Choose local Whisper for full privacy, or cloud backends for zero memory and WisprFlow-tier latency.

$ curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash

When you hold Option, a floating pill appears with 16 bars responding to your voice:

Release the key — Whisper transcribes on-device or in the cloud, an LLM polishes the structure (lists, fillers, backtrack), and the text pastes at your cursor.

Why Susurro

A voice dictation tool that respects your choices about privacy, latency, and where inference runs.

WisprFlow-tier latency

~0.7 s from key-release to pasted text with Groq's hosted Whisper + Llama. Faster than the original on most networks.

🧠

Smart formatting

Ordinals become numbered lists. Fillers ("eh", "um", "o sea") get removed. Self-corrections ("actually 3 PM") get applied. The LLM only fires when triggers detect a payoff.

🔀

Hot-swap backends

Run Whisper locally (MLX, 0 network) or remotely (Groq, OpenAI, Deepgram, Gemini, Anthropic — some shipped, more on the way). Polish LLM is independent.

🔒

Your privacy posture, your call

Local STT keeps audio on-device. Cloud STT trades that for zero RAM and faster latency. Polish can run cloud-only on text. You pick the mix.

📝

Local audit log

Every (raw → polished) pair is appended to ~/.susurro/polish.jsonl so you can inspect what changed. Local only, never sent anywhere.

💰

Free, MIT

No subscription. No telemetry. ~$0.06 / 1000 dictations if you use Groq (free tier covers normal use). $0 if you stay local.

Backends

Each stage of the pipeline (STT → polish) can use a different backend. Implemented now, more on the way.

StageBackendMemoryLatency (5 s clip)Cost / 1k dictationsStatus
STTlocal (MLX Whisper)~3 GB~1.0 s$0Shipped
STTgroq0 GB~0.15 s$0.06Shipped · default
STTopenai gpt-4o-transcribe0 GB~0.6 s$0.50v0.4
STTdeepgram Nova-30 GB~0.2 s$0.36v0.4
Polishoff · raw STT00 ms$0Shipped
Polishrules · regex only0~5 ms$0Shipped
Polishsmart · Groq Llama 3.3 70B0 GB~0.3 s$0.06Shipped · default
Polishanthropic Claude Haiku 4.50 GB~0.5 s$0.40v0.4
Polishgemini Flash0 GB~0.4 s$0.01v0.4

Install

macOS 13+ on Apple Silicon. Python 3.10+. ~3 minutes including dependencies.

One-line install

$ curl -fsSL https://raw.githubusercontent.com/danilobrando/susurro/main/install.sh | bash

The script installs Python via Homebrew if needed, sets up pipx, installs Susurro from GitHub, and prints the next-step instructions for setting your API key and granting macOS permissions.

Or with pipx directly

$ pipx install git+https://github.com/danilobrando/susurro

If you already have pipx. Then susurro is on your PATH.

Set your Groq key

Get a free key at console.groq.com/keys. The free tier covers normal personal use. Add it to your shell rc:

$ echo 'export SUSURRO_GROQ_API_KEY="gsk_..."' >> ~/.zshrc && source ~/.zshrc

Then run susurro. The first hotkey press will prompt for Microphone, Accessibility, and Input Monitoring — grant each, then restart your terminal.

FAQ

How is this different from WisprFlow?

Same core UX (hotkey-driven dictation with smart formatting), but Susurro is free, MIT-licensed, and lets you choose where inference runs. WisprFlow is closed-source, paid, and cloud-only. Susurro also exposes the polish log locally so you can audit every edit.

Does my audio leave my machine?

Only if you choose a cloud STT backend. With STT_BACKEND="local", audio stays on-device — only the polished transcript (text) goes to the polish LLM if you set POLISH_BACKEND to a cloud provider. With both stages local, nothing leaves.

What happens if my API key expires or Groq has an outage?

The daemon catches the error, switches to the local MLX backend for that and all future requests in the session, and posts a macOS notification telling you. You don't lose the transcription.

Why is the default not 100% local?

Local Whisper holds ~3 GB of RAM continuously. Most users prefer trading that for ~0.06 USD per 1000 dictations and faster latency. If you want full local, change one line in susurro/config.py: STT_BACKEND = "local".

Do I need Apple Silicon?

For now, yes. The package depends on Apple's MLX framework even if you only use cloud backends. v0.4 will split that into an optional extra so Intel Macs can run cloud-only.

Is there a Windows or Linux version?

Not currently. The menu bar UI uses macOS-specific frameworks (rumps, AppKit, PyObjC) and the global hotkey + paste pipeline relies on macOS Accessibility APIs. Porting would mean rewriting roughly half the codebase.

Can I customize the hotkey?

Yes. Edit HOTKEY in susurro/config.py. Default is "alt_r" (right Option). Any pynput Key name works: "alt_l", "ctrl_r", "f19", etc.