Hybrid AI Protocol: Orchestrating Cloud and Local AI

  • Post author:
  • Post category:AI
  • Post comments:0 Comments

Background
I am not going to define or describe what AI agents are, as all of us can easily find relevant information on the web. I would like to share a framework I developed to balance the cognitive power of cloud AI with the cost-efficiency of local execution.

While powerful cloud models like Opus 4.x can be integrated to manage local files and interact with my computer, running all tasks—no matter if they are complex, basic, or recurring—unnecessarily consumes expensive tokens. Even though there are several pricing tiers of cloud models to choose from, constantly switching between them during rapid implementation cycles is inconvenient. Local open-source models are fantastic for handling basic, repetitive tasks for free, but they are strictly bound by the hardware they run on.

Config
* Mac hardware: Mac mini M2 Pro with 32GB Unified Memory.
* Local Agent: “Hermes” framework running the gemma4:26b-mlx model.
* Cloud Agent: Claude running Opus 4.x.

Issue
Due to the hardware limitation of my Mac’s 32GB of memory (unlike a Mac Studio which can be configured with up to 512GB of unified memory), the relatively small gemma4:26b-mlx model frequently truncated complex Python scripts mid-generation. It would abruptly cut off critical logic right after the import statements and variable declarations.

To be clear, this truncation is due to the physical hardware constraints and the local model size, not a limitation or flaw in the Hermes agent framework itself. However, it meant the local agent could not be trusted to reliably write structural code.

Solution
To solve this, I built the Hybrid AI Orchestration Protocol: a directory-isolated, dual-agent “Walkie-Talkie” Blackboard architecture. The agents never speak directly; they coordinate via a single state.json file inside a .handover/ directory. This file separates liveness from intent (tracking the holder and the purpose).

To prevent file conflicts and token waste, the protocol enforces strict roles:
1. The Client (Me): Initiates the Product Requirements Document (PRD), answers questions, and gives final sign-off.
2. Claude (The Architect & QA): Handles structural planning, complex coding, and final QA. Claude is also the sole Version Control Manager—Hermes is strictly locked out of autonomous Git operations so a truncated script never overwrites the repository.
3. Hermes (The Executor): Restricted from structural coding. It reads Claude’s scripts, checks paths, installs pip dependencies, executes terminal tests, and reports errors back to Claude.

Tricks
What makes this system genuinely robust are three specific guardrails implemented to prevent common multi-agent failure modes:
* The DRILL-ME Phase: Before any code is written, Claude must interrogate me about the PRD. Crucially, Claude must ask operational questions on behalf of Hermes (e.g., confirming the exact macOS path for credentials). This ensures the local execution environment is mapped beforehand.
* Anti-Infinite-Loop: Agents can easily get stuck in an endless loop of failing a test and rewriting broken code. The state file tracks an attempt_count. If a script fails 5 times, it triggers an ABORTED state, forcing a human review.
* Append-Only Logs: During testing, failures are saved as append-only numbered logs (log_001.txt, log_002.txt) instead of overwriting the terminal output. This preserves the complete debugging history for Claude to analyze.

Leave a Reply