Project Awesome project awesome

Desktop Control

GPU-accelerated CLI for AI agents to control any macOS app via screen, mouse, and keyboard.

Package 26 stars GitHub

DesktopCtl

Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.

Runs fully local. No screenshots sent to the cloud.

Learn more at https://desktopctl.com

https://github.com/user-attachments/assets/4321b23e-6706-4792-a911-89e13766ebc0

Why DesktopCtl

  • Local-first runtime. No cloud dependency
  • Bring your own AI: works with any desktop AI agent
  • GPU-accelerated text recognition and computer vision
  • Selector-first automation (--text, --token) with coordinate fallback
  • Agent-friendly explicit waits and post-action verification
  • Stable JSON contracts for agent integrations

Architecture

DesktopCtl is split into two binaries:

  • DesktopCtl.app (desktopctld): daemon that owns perception, state, execution, and verification
  • desktopctl: stateless CLI surface for actions and queries over local IPC

Repository layout:

  • src/desktop/core - shared protocol and types
  • src/desktop/daemon - daemon runtime
  • src/desktop/cli - CLI client

Current Scope

  • macOS-first
  • OCR-first perception pipeline
  • Tokenized screen output for agent grounding
  • Deterministic CLI primitives for click/type/wait flows

Prerequisites

  • macOS (current support target)
  • Rust toolchain (cargo)
  • just command runner
  • Accessibility permission for DesktopCtl.app
  • Screen Recording permission for DesktopCtl.app

Quick Start

just build run
raw="$(desktopctl app open Notes --json)"
win_id="$(printf '%s' "$raw" | jq -r '.result.window_id // empty')"
desktopctl keyboard press cmd+f --active-window "$win_id" --no-observe
desktopctl keyboard type "Shopping list" --active-window "$win_id" --no-observe
desktopctl screen tokenize --active-window "$win_id"

Status / Roadmap

  • Status: active development, with macOS-first CLI and daemon workflows already usable.
  • Reliability for text/token-driven actions and verification loops. Stable machine-readable error codes.
  • Upcoming CLI: doctor, richer window/app introspection, and --explain failure output.
  • Better local computer vision and semantic UI tokenization.
  • Multi-platform support.
Back to Apps