When the bottleneck moves from writing to verifying

When the Bottleneck Moves to Verifying

Coding agents invert the assumption CI was built on. Writing is cheap, parallel and tireless; verifying is expensive, sequential and human-paced. ACIG is our answer.

5 May 2026

Continuous integration was built around an assumption that no longer holds: writing code is the slow part. Pipelines were designed for a senior developer producing one to three considered pull requests a day, with logs written for tired humans to read at the end of one. Five-to-twenty-minute build times. A reviewer ritualising the merge.

Coding agents quietly invert that. A modest fleet, five to ten agents working in parallel on a single repository, produces tens of pull requests a day. Work that used to take a person several hours takes the agent several minutes, and the agent is perfectly happy to retry until something passes. The chokepoint moves: writing is cheap, parallel and tireless; verifying is expensive, sequential and human-paced.

We built ACIG (the Agentic CI Gateway) because we kept watching the same shape of failure on real teams adopting agents. Either humans rubber-stamped pull requests because they could not read at agent throughput, or CI bills went vertical because every retry hammered the frontier-model adjudicator, or more insidiously, agents quietly gamed weak tests and shipped confidently broken code. The pipeline that worked fine for one cook in the kitchen does not work for fifty.

Verification needs to be cheap, parallel, structured and tiered

Our framing: verification needs to be cheap, parallel, structured and tiered. Most of the time you do not need a frontier model to tell you that a diff has a SQL string concatenation, a missing test, or a secret-shaped string in plain view. You need a fleet of small, specialised critics that run in parallel and report findings in a shape an agent can actually consume.

ACIG is a single Go binary. You install it from a Homebrew tap, drop it into a GitHub Action, and optionally wire it into a pre-push hook. On every diff it fans a configurable set of critics out across goroutines: a risk classifier, a style critic, a test-coverage smell, a security smell, a performance smell, and others. Most run on Ollama Cloud, gpt-oss:20b for the cheap tier, qwen3-coder:480b for the mid tier, at a small fraction of frontier prices. A frontier adjudicator (Claude Sonnet by default) is invoked sparingly: only when critics disagree, when the change is risk-high, or when a human has explicitly asked for it.

The output is a structured JSON verdict alongside a markdown summary. That detail matters more than it first appears. CI logs today are walls of ANSI-coloured text designed for humans to scan with their eyes; agents do dramatically better with machine-readable feedback, and the cost of an agent's iteration loop drops noticeably when it can consume findings as JSON rather than scrape stdout. Every ACIG run posts a sticky pull request comment with the markdown summary at the top and the full JSON inside a collapsed <details> block, so producing agents can fetch and parse the verdict deterministically on their next attempt.

What a run looks like

A typical run on a non-trivial pull request returns a verdict in five to twenty seconds, costs between one and ten pence, and surfaces findings specific enough, referenced by file and line, that the agent can act on them in its next iteration without human intervention. The pre-push hook runs even faster on the cheap tier alone, gating problematic commits before they reach CI in the first place.

Honest caveats

ACIG is a critic, not an oracle. It catches shape-of-bad-code problems, not deep architectural ones, and it cannot tell you whether the change actually solves your business problem. The cheap critics will occasionally flag things that are not real issues, and you will want to tune which critics run on which paths. The cost numbers above assume Ollama Cloud as the workhorse; if you push everything through a frontier API you will pay an order of magnitude more for marginal accuracy gains.

The structural shift matters more than any individual critic

What we have found in practice is that the structural shift matters more than any individual critic. Once you accept that the writer is parallel and the verifier is the constrained resource, you start engineering CI like a kitchen built for fifty cooks rather than one: automated taste-tests, structured signals, the head chef called only when something genuinely smells off. ACIG is our first cut at that kitchen.

It is also one half of an architecture — the verification half. The other half, hardening the human's intent before any agent touches the code, is the subject of our next post.

Install

# Homebrew
brew tap helloodokai/tap
brew install acig

Documentation and source at github.com/helloodokai/acig.

Back to Blog

Product

Resources

Advisory Services

Get Started