dictee
Push-to-talk voice dictation with KDE Plasma 6 plasmoid, PyQt6 setup wizard, and tray icon. 4 ASR backends (Parakeet, Vosk, Whisper, Canary), post-processing pipeline, translation. 100% local, 25+ languages.
Speaking is just easier.
Speak freely, type instantly — 100% local voice dictation for Linux with 25+ languages, 5 translation backends, speaker diarization, and real-time visual feedback. Text appears right where your cursor is.
📚 New: the full dictee wiki is now online — 24 pages covering installation, configuration, all 4 ASR backends (with Parakeet-TDT and Canary-1B deep-dives), post-processing, diarization, troubleshooting, and developer guide. Available in 🇬🇧 English and 🇫🇷 French.
What is dictee? • Quick start • Features • Installation • Configuration • Usage • Post-processing • Limitations • Roadmap • Wiki
What is dictee?
dictee is a complete voice dictation system for Linux. Press a shortcut, speak, and the text is typed directly into the active application — any application, any window, any text field.
Transcription is performed 100% locally by default: no audio ever leaves your machine unless you explicitly choose a cloud translation backend.
- 🔒 100% local by default — Parakeet, Canary, faster-whisper and Vosk all run offline on your hardware
- 🌍 25+ languages — with native punctuation and capitalization (Parakeet-TDT)
- 🔀 4 ASR backends — switch instantly depending on language, latency and hardware
- 🎨 Visual feedback — KDE Plasma widget, system tray, or fullscreen animation
Quick start
Three steps to go from zero to dictation in under two minutes:
1. Install
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash
2. Configure
The first-run wizard walks you through backend selection, model download and keyboard shortcut binding. Re-run anytime with dictee --setup.
3. Speak
Press your shortcut (default F9), speak, release. The transcription appears at your cursor.
For detailed install paths (manual .deb/.rpm, GPU prerequisites, AUR, from source), see Installation below or the wiki's Installation and GPU-Setup pages.
Features
4 ASR backends
| Backend | Languages | Model size | Warm latency | Notes |
|---|---|---|---|---|
| Parakeet-TDT 0.6B v3 | 25 | ~2.5 GB | ~0.8s CPU · ~0.16s GPU | Default, native punctuation |
| Canary-1B v2 | 25 | ~5 GB | ~0.7s GPU | Built-in translation (25 ↔ EN, 48 pairs) |
| faster-whisper | 99 | ~500 MB–3 GB | ~0.3s | Wide language coverage |
| Vosk | 20+ | ~50 MB | ~1.5s | Lightweight, strict offline |
Each backend runs as a systemd user service with the same Unix socket protocol — switching is transparent. → ASR-Backends wiki
5 translation backends
| Backend | Privacy | Speed | Quality | Languages |
|---|---|---|---|---|
| Canary-1B | 🔒 Local | Built-in | Excellent | 4 |
| LibreTranslate | 🔒 Local | 0.1–0.3s | Good | 30+ |
| Ollama | 🔒 Local | 2–3s | Excellent | Any (LLM) |
| Google Translate | 🌐 Cloud | 0.2–0.7s | Excellent | 130+ |
| Bing Translator | 🌐 Cloud | 1.7–2.2s | Very good | 100+ |
→ Translation wiki · Ollama-Setup
Post-processing pipeline
A 12-step configurable pipeline transforms raw ASR output before it hits your cursor:
- Regex rules + dictionary — 7 languages, ASR variants, voice commands → Rules-and-Dictionary
- LLM correction — optional fluency polish via local Ollama (first / last / hybrid position) → LLM-Correction
- Numbers & dates — cardinal, ordinal, versions, decimals, French times → Numbers-Dates-Continuation
- Continuation buffer — continue a sentence across dictations with last-word memory
- Short-text keepcaps — per-language exceptions for acronyms and names (new in v1.3)
Speaker diarization (Meetings)
Answer "who spoke when?" in multi-speaker recordings via NVIDIA's Sortformer model. Up to 4 speakers, ideal for meeting notes and interviews. Triggered via Meeting mode or dictee --meeting. → Diarization wiki
3 visual interfaces
- KDE Plasma 6 widget — native QML plasmoid, 5 animation styles, live state → Plasmoid-Widget
- System tray icon — PyQt6, works on GNOME/XFCE/Sway (AppIndicator fallback) → Tray-Icon
- animation-speech (external) — fullscreen overlay on
wlr-layer-shellcompositors
All three share state via a filesystem watcher — any change is reflected instantly across interfaces (multi-user safe with UID suffix).
animation-speech (fullscreen overlay)
animation-speech is a standalone project that provides a fullscreen visual animation during recording, with cancellation via the Escape key. It works on any Wayland compositor supporting wlr-layer-shell (KDE Plasma, Sway, Hyprland…).
sudo dpkg -i animation-speech_1.2.0_all.deb
Download: animation-speech releases
Note: animation-speech is not compatible with GNOME (no
wlr-layer-shellsupport). GNOME users can rely ondictee-trayfor visual feedback. Contributions for a GNOME Shell extension are welcome — see the plasmoid source for reference architecture.
Installation
One-liner (recommended)
Auto-detects distro and GPU, adds the NVIDIA CUDA repo if needed, installs the right package:
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash
Supported: Ubuntu, Debian, Fedora, openSUSE, Arch Linux. Other distros fall back to the tarball path.
Options (after --):
# Force CPU (skip GPU detection)
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash -s -- --cpu
# Force GPU (CUDA)
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash -s -- --gpu
# Pin a specific version
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash -s -- --version 1.3.0
# Non-interactive
curl -fsSL https://raw.githubusercontent.com/rcspam/dictee/master/install.sh | bash -s -- --non-interactive
Manual install
Download from Releases.
Ubuntu / Debian (CPU):
sudo apt install ./dictee-cpu_1.3.0_amd64.deb
Ubuntu / Debian (GPU): requires the NVIDIA CUDA APT repo — see GPU-Setup for the one-time setup, then:
sudo apt install ./dictee-cuda_1.3.0_amd64.deb
Fedora / openSUSE (CPU):
sudo dnf install ./dictee-cpu-1.3.0-1.x86_64.rpm
Fedora / openSUSE (GPU): add the CUDA repo first (see GPU-Setup), then dictee-cuda-1.3.0-1.x86_64.rpm.
Arch Linux (AUR): PKGBUILD in the repo root (x86_64 + aarch64). Clone + makepkg -si.
aarch64 / Jetson: no pre-built package — build from source. CUDA limited to NVIDIA Jetson boards.
Other distros (tarball):
tar xzf dictee-1.3.0_amd64.tar.gz
cd dictee-1.3.0
sudo ./install.sh
From source: cargo build --release --features sortformer then sudo ./install.sh. See Developer-Guide for full Cargo features and package build scripts.
Configuration
First launch triggers a setup wizard (backend, model, shortcuts).
Reconfigure anytime from the application menu, tray icon, Plasma widget, or by running:
dictee --setup
Backend switching (one-liner)
# Show current backends
dictee-switch-backend status
# Switch ASR (parakeet · canary · whisper · vosk)
dictee-switch-backend asr canary
# Switch translation (canary · libretranslate · ollama · google · bing)
dictee-switch-backend translate ollama
The tray and plasmoid include backend sub-menus — no terminal required.
For detailed configuration (all ASR backends, translation matrix, plasmoid settings, keyboard shortcuts on tiling WMs), see the wiki:
- ASR-Backends · Translation
- Plasmoid-Widget · Tray-Icon
- Keyboard-Shortcuts (KDE/GNOME/Sway/i3/Hyprland)
Usage
# Simple dictation — transcribe and type
dictee
# Dictate + translate (default: system language → English)
dictee --translate
dictee --translate --ollama # 100% local via Ollama
# Change target language
DICTEE_LANG_TARGET=es dictee --translate # → Spanish
# Meeting mode (diarization, up to 4 speakers)
dictee --meeting
# Cancel ongoing dictation
dictee --cancel
# Test post-processing rules live
dictee-test-rules # interactive
dictee-test-rules --loop # continuous loop
dictee-test-rules --wav file.wav # from audio file
→ Full command reference: CLI-Reference wiki
Post-processing
dictee runs a configurable 12-step pipeline after transcription and before paste:
- ASR variants normalization
- Dictionary substitution
- Numbers & dates conversion
- Continuation buffer merge
- Regex rules (pre-LLM)
- LLM correction (optional, first position)
- Regex rules (post-LLM)
- Short-text exceptions (keepcaps)
- Extended match mode
- Final capitalization
- Translation (optional)
- Paste / inject
Configure via dictee --setup → Post-processing tab, or test rules live with dictee-test-rules.
→ Deep dives: Post-Processing-Overview · Rules-and-Dictionary · LLM-Correction · Numbers-Dates-Continuation
Known limitations
- Diarization + Parakeet on 8 GB GPU is capped around 10–15 min of audio. Parakeet-TDT loads the full mel-spectrogram in one pass (~185 MB VRAM per minute), which overflows consumer GPUs past ~15 min. Workarounds: split the file, disable diarization, or use the CPU backend. Auto-chunking is planned for the v1.3 final release. → Diarization wiki
- AMD / Intel GPUs are not currently supported — dictee falls back to CPU.
- No real-time streaming — Parakeet-TDT and Canary require the full utterance; only Nemotron (EN-only, via Rust binary) streams natively.
For bug reports and workarounds, see Troubleshooting.
Roadmap
v1.3.0 (current) — Short-text keepcaps exceptions (7 languages), extended match mode, LibreTranslate purge models, continuation + translate fixes, version-number dictation, multi-user safe (UID suffix on state files), plasmoid cross-process toggles (LLM / Short / Meeting), 682 postprocess tests + 148 pipeline tests, theme-aware banner.
v1.4+ (planned)
- Chunked diarization — process files > 15 min via
transcribe-diarize-batch(prototype validated: 54 min in 122 s) - Hotword boosting — bias ASR decoding toward custom names (shallow fusion on TDT logits, Parakeet only)
- Whisper translate — multi-target translation via
task="translate"(EN-only, offline) - Moonshine CPU backend
- CLI speech-to-text — pipe audio, get text
- VAD — hands-free dictation without push-to-talk
- Streaming transcription with live text display
- Built-in overlay — replace external
animation-speech - AppImage / Flatpak packaging
- COSMIC / GNOME Shell applets (contributions welcome!)
→ Full history: Changelog wiki
Credits
The transcription engine builds on parakeet-rs by Enes Altun — Rust library for NVIDIA Parakeet inference via ONNX Runtime. The Rust Canary implementation was originally ported from onnx-asr by Ivan Stupakov and is now fully self-contained. Parakeet and Canary ONNX models are provided by NVIDIA (downloaded separately from HuggingFace, not redistributed by this project).
Keyboard input simulation uses dotool by geb (GPL-3.0).
License
This project is distributed under the GPL-3.0-or-later license (see LICENSE).
The original parakeet-rs code by Enes Altun is under the MIT license (see LICENSE-MIT). dotool is bundled under GPL-3.0.