Quick start
Core workflow
The typical agent loop is: snapshot the desktop, interpret the result, act on it.
# 1. see the desktop
deskctl --json snapshot --annotate
# 2. click a window by its ref
deskctl click @w1
# 3. type into the focused window
deskctl type "hello world"
# 4. press a key
deskctl press enter
The --annotate flag draws colored bounding boxes and @wN labels on the screenshot so agents can visually identify windows.
Window refs
Every snapshot assigns refs like @w1, @w2, etc. to each visible window, ordered top-to-bottom by stacking order. Use these refs anywhere a selector is expected:
deskctl click @w1
deskctl focus @w3
deskctl close @w2
You can also select windows by name (case-insensitive substring match):
deskctl focus "firefox"
deskctl close "terminal"
JSON output
Pass --json for machine-readable output. This is the primary mode for agent integrations:
deskctl --json snapshot
{
"success": true,
"data": {
"screenshot": "/tmp/deskctl-1234567890.png",
"windows": [
{
"ref_id": "w1",
"xcb_id": 12345678,
"title": "Firefox",
"app_name": "firefox",
"x": 0,
"y": 0,
"width": 1920,
"height": 1080,
"focused": true,
"minimized": false
}
]
}
}
Daemon lifecycle
The daemon starts automatically on the first command. It keeps the X11 connection alive so repeated calls are fast. You do not need to manage it manually.
# check if the daemon is running
deskctl daemon status
# stop it explicitly
deskctl daemon stop