We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews

This is a submission for the Hermes Agent Challenge

Our goal was not to build another AI wrapper, but to explore how Hermes Agent behaves as a persistent orchestration layer coordinating specialized autonomous workers inside real engineering governance workflows.

Most AI systems today are still fundamentally single-threaded assistants wrapped inside nicer interfaces.
You type a prompt, the model responds, and the workflow ends there.

But our problem was different.

Over the last few years we worked closely with alumni groups, business operators, SaaS platforms, and community engineering teams. One recurring issue appeared everywhere:

People did not simply want AI-generated text.
They wanted workflow intelligence.

They wanted systems capable of:

coordinating technical tasks,
evaluating operational risks,
planning execution flows,
synthesizing structured engineering decisions,
and operating reliably across multiple autonomous workers.

That realization eventually led us toward Hermes Agent.

Not because we wanted another chatbot.

But because we wanted to explore orchestration.

The Core Idea

We started asking ourselves a simple question:

What happens when Hermes stops behaving like a conversational assistant and starts behaving like a managerial orchestration layer?

That question became the foundation of our experiment.

The result was Gotihub Hermes Crew.

The name itself carries the philosophy behind the project.

Gotihub is derived from the Bengali word Goti (গতি), meaning Speed.

We wanted to explore whether autonomous engineering workers could coordinate quickly, reliably, and structurally inside real governance workflows.

The result became a high-speed multi-agent engineering orchestration system capable of analyzing GitHub repositories through specialized autonomous workers coordinated by Hermes.

Project Links

Live Demo

https://crew.gotihub.com

GitHub Repository

https://github.com/apurba-labs/gotihub-hermes-crew

Why We Didn’t Want a Single Monolithic Agent

One massive prompt window handling:

security analysis,
architecture auditing,
roadmap planning,
and executive synthesis

quickly becomes expensive, unstable, and difficult to govern.

So instead of forcing one model to think about everything simultaneously, we separated:

Execution from Governance

Execution Layer

Specialized Gemma workers execute focused engineering tasks independently.

Governance Layer

Hermes coordinates, synthesizes, and manages the outputs generated by those workers.

That separation became the most important architectural decision in the project.

The Multi-Agent Architecture

Our orchestration pipeline follows four major stages:

SecurityAgent performs repository security analysis.
ArchitectureAgent evaluates structural and maintainability health.
PlanningAgent generates engineering roadmap recommendations.
Hermes Master synthesizes everything into a structured managerial report.

The important detail is that the first stage executes concurrently.

We intentionally used Python’s native asynchronous execution model instead of sequential blocking pipelines.

Stage 1 Concurrency with `asyncio.gather`

The first orchestration layer launches multiple specialized workers simultaneously:

SecurityAgent
ArchitectureAgent

Both execute inside an asyncio.gather() orchestration block.

This allowed us to explore:

concurrent repository analysis,
isolated engineering responsibilities,
and structured task specialization.

Instead of treating AI as a single giant context window, we treated it like a coordinated engineering crew.

System Workflow Architecture

Here is the orchestration workflow powering the system:

The workflow is intentionally separated into:

concurrent execution,
planning synthesis,
and executive orchestration.

This structure allowed us to keep responsibilities isolated while still producing a consolidated engineering report.

Hermes as the Orchestrator

This is where Hermes became genuinely interesting.

Hermes does not directly parse raw repositories in our architecture.

Instead, Hermes behaves like a managerial synthesis layer.

The worker agents generate:

summaries,
issue reports,
confidence scores,
engineering recommendations.

Hermes then:

resolves overlap,
synthesizes cross-agent conclusions,
generates executive summaries,
and produces structured JSON outputs.

In other words:

The workers execute.
Hermes governs.

That orchestration philosophy changed how we approached agent systems entirely.

Multi-Subdomain Infrastructure Design

As the system evolved, we realized orchestration architecture alone was not enough.

We also needed infrastructure separation.

So we deployed the ecosystem using multiple subdomains and isolated routing layers:

gotihub.com → corporate site
agl.gotihub.com → SaaS engine
crew.gotihub.com → Hermes orchestration platform

Behind the scenes:

FastAPI handled orchestration,
Docker managed runtime isolation,
Nginx routed ingress traffic,
Ollama powered local inference,
and Hermes coordinated the synthesis layer.

Most importantly:

The inference backbone was never exposed directly to the public internet.

Internal AI Backbone Architecture

The deployment topology evolved into something closer to a lightweight orchestration mesh:

This allowed multiple services to share:

one centralized inference core,
isolated application routing,
and internal-only AI communication.

Real Engineering Problems We Hit

This project was not smooth.

And honestly, that’s where most of the learning happened.

The Local Compute Bottleneck

Our earliest orchestration runs were extremely slow.

One real telemetry session looked like this:

[TELEMETRY] GitHubLoader fetched 8 files in 5.91 seconds.

[Orchestrator] Starting Full Pipeline...
[TELEMETRY] Stage 1 took 218.68 seconds.
[TELEMETRY] Stage 2 took 72.19 seconds.
[TELEMETRY] Stage 3 took 120.18 seconds.

[TELEMETRY] Pipeline Complete! Total Runtime: 411.05 seconds.

The bottleneck was not orchestration.

It was:

oversized repository context,
local inference latency,
verbose prompt chains,
and massive token generation overhead.

That distinction mattered.

Because it meant the architecture itself was scalable — but inference strategy needed optimization.

What We Optimized

We eventually began improving runtime by:

reducing repository context size,
prioritizing critical engineering files,
limiting unnecessary token generation,
shrinking synthesis payloads,
and improving async orchestration boundaries.

The system became dramatically more stable once we stopped treating every file equally.

Defensive Failure Engineering

One of the most important lessons came from structured output failures.

Large orchestration chains occasionally returned:

malformed JSON,
partial synthesis blocks,
or incomplete manager responses.

Instead of allowing pipeline collapse, we added:

fallback execution paths,
JSON cleanup layers,
defensive parsing,
and structured failure recovery.

That forced us to think less like prompt engineers and more like systems engineers.

Why Hermes Actually Worked Well

Frameworks like CrewAI are excellent for rapidly assembling conversational agent pipelines.

But our exploration focused on something slightly different:

persistent orchestration,
structured engineering outputs,
governance-oriented workflows,
and isolated worker responsibilities.

We wanted Hermes to operate less like a conversational assistant and more like an engineering coordination layer.

That distinction became the entire philosophy behind the project.

What Fascinated Us Most

The most interesting part was not whether AI could generate text.

It was whether autonomous workers could coordinate reliably inside real operational systems.

That changes the conversation entirely.

Instead of asking:

“Can AI answer questions?”

We started asking:

“Can AI workers collaborate responsibly inside engineering governance workflows?”

Hermes gave us a practical way to explore that future.

And honestly, that exploration became far more valuable than simply building another AI wrapper.

Built With

Hermes Agent
FastAPI
Python AsyncIO
Ollama
Gemma 3
Docker
Nginx
SQLite
Next.js

Final Thoughts

This project is still evolving.

We are actively optimizing:

orchestration runtime,
inference efficiency,
streaming telemetry,
structured synthesis,
and governance reliability.

But the biggest thing we learned was this:

Autonomous systems become genuinely interesting when they stop behaving like isolated chatbots and start behaving like coordinated engineering workers.

That is the future we wanted to explore with Hermes.

And we are excited to continue building toward it.

推荐订阅源

DEV Community

The Core Idea

Project Links

Live Demo

GitHub Repository

Why We Didn’t Want a Single Monolithic Agent

Execution from Governance

Execution Layer

Governance Layer

The Multi-Agent Architecture

Stage 1 Concurrency with `asyncio.gather`

System Workflow Architecture

Hermes as the Orchestrator

Multi-Subdomain Infrastructure Design

Internal AI Backbone Architecture

Real Engineering Problems We Hit

The Local Compute Bottleneck

What We Optimized

Defensive Failure Engineering

Why Hermes Actually Worked Well

What Fascinated Us Most

Built With

Final Thoughts

推荐订阅源

DEV Community

The Core Idea

Project Links

Live Demo

GitHub Repository

Why We Didn’t Want a Single Monolithic Agent

Execution from Governance

Execution Layer

Governance Layer

The Multi-Agent Architecture

Stage 1 Concurrency with asyncio.gather

System Workflow Architecture

Hermes as the Orchestrator

Multi-Subdomain Infrastructure Design

Internal AI Backbone Architecture

Real Engineering Problems We Hit

The Local Compute Bottleneck

What We Optimized

Defensive Failure Engineering

Why Hermes Actually Worked Well

What Fascinated Us Most

Built With

Final Thoughts

Stage 1 Concurrency with `asyncio.gather`