Run AI agents in isolated Docker containers. Watch them work in real-time. Verify results automatically. Multi-provider support built-in.
$ uv run helios tasks/explore-desktop --watch
# Agent running in Docker container
# Open http://localhost:8080 to watchCapabilities
A complete framework for orchestrating computer-use agents with real-time observation and automated verification.
Each agent runs in its own Docker container with full desktop access. Complete isolation ensures safe execution of any task.
Watch agents work through a live web viewer at localhost:8080. See every click, keystroke, and decision as it happens.
Define test.sh scripts to verify agent outcomes. Get clear pass/fail results with granular reward scores from 0 to 1.
Switch between Anthropic, OpenAI, Gemini, and AWS Bedrock with a single flag. Use the best model for each task.
Run multiple tasks in parallel with configurable concurrency. Perfect for benchmarks and large-scale evaluation runs.
Deploy to Daytona cloud sandboxes for scalable execution without local Docker. Enterprise-ready from day one.
How it works
Helios orchestrates a clean pipeline from task definition to verified results. Every component is modular and extensible.
Define what the agent should do with instruction.md and task.toml configuration
Route to any LLM provider through a unified, type-safe interface
Execute in isolated Docker containers or scalable cloud sandboxes
Run test.sh scripts and collect reward scores automatically
reward.txt0 | 1 | 0.0-1.0Integrations
$ helios tasks/my-task -m claude-sonnet-4-20250514Quick Start
Install, configure, and run your first agent.
# Install dependencies
$ uv sync# Set up API keys and build Docker images
$ cp .env.example .env
$ docker build -t cua-desktop -f docker/Dockerfile.desktop .# Run your first agent with live viewing
$ uv run helios tasks/explore-desktop --watch
# Open http://localhost:8080 to watch the agentJoin developers using Helios to run, observe, and verify computer-use agents at scale.