Aleksei Petrov
CTO at QuantFlow; builds AI agents that integrate with CI and issue trackers to automate coding and delivery with telemetry and controls.
Montenegro's Pilot AI Scores 82% Benchmark, Proving Hub Status
Next week I'm presenting at AIM Innovation Week in Podgorica. EU-backed event bringing together startups, investors, and corporates around Montenegro's innovation ecosystem. Showing how AI builds software autonomously. Our tool Pilot is scoring 82%+ on the global industry benchmark, built right here in Montenegro. Amazon, Deutsche Telekom, EBRD, European Commission in the room. Montenegro as an AI hub — not a pitch, a fact we're proving now.

Navigator: Top ClaudeCode Plugin for Structured Dev Workflows
Navigator is still #1 and the only ClaudeCode plugin I use 🧭 Built for experienced devs who care about roadmaps and thinking before execution. Screenshot is Nav loop mode in action: full cycle of execution, checks, tests before anything ships +...
One Person + Claude Equals Whole Team Productivity
Anthropic’s team, from the inside-out view. 1 person + Claude = full team output The top people already work like this – they manage the whole department’s effort through Claude, instead of managing the department to produce the effort.
Removing Depth Limit Boosts Agent Success From 58% to 88%
Spent two weeks benchmarking Pilot on Terminal Bench 2.0. Ran 500+ tasks across 15 experiments. Built analysis pipelines. Measured variance. Compared agent behavior across pass vs fail runs. The fix that moved the needle? Removing one env var that forced maximum thinking...

CLI Version Beats Prompts and Node Upgrades
Node 18 + ClaudeCode 2.1.72 is a cheat code 😉 We benchmark Pilot on Terminal Bench 2.0. 89 real coding tasks, Opus 4.6, Modal containers. Ran 10+ full experiments over two days. The CLI tool version matters more than prompt engineering, effort...

Pilot's Terminal Bench 2.0 Achieves 100% Accuracy
Did few updates to Pilot. Re-started Terminal Bench 2.0 pre-tests: 10/10 at the moment 100% correctness 💪 This technology rocks https://pilot.quantflow.studio 2 month of hard pushing and look at this, amazing results.

Pilot Hits 68.5% Benchmark, Surpassing Claude Code
First full benchmark run on terminal-bench 2.0 – 15h run. RESULTS: Pilot: 68.5% Claude Code: 58% +10.5 points, target achieved. Switched from Daytona to Modal after infra kept choking on heavy tasks. Night and day difference. 27 failures left to investigate. 7 are OOM kills 3 were...

Reading Test File First Solved Pilot Debugging Delays
This test drove me crazy. A solid proof that Pilot works but each pass takes forever when you're debugging infra. 4 days... - Python wrapper to run Pilot (Go) inside Harbor's benchmark harness - Migrated to Daytona sandboxes - ~50 failed attempts on config, wrapper...
Switched to Daytona Claude, Opus Revived in Under a Minute
We’re still grinding through Harbor’s tests 🤦♂️ Overnight run died on my Mac, so I moved everything to Daytona’s Claude – amazing service with a clean CLI, Opus was back up in under a minute. I’ll keep you updated – next results...

Pilot Shows $1, 30‑Minute Runs Beat Harbor Benchmark
Focusing on Harbor’s benchmark to prove Pilot’s efficiency. The tests are fascinating, real challenge 💪 and Pilot already has first results. Each run takes 30–40 minutes and costs about ~$1 for Pilot. Now waiting for the full report to see where we land...

Pilot Continuously Learns, Optimizing PR Pipelines Automatically
Pilot doesn't just ship tickets — it learns from them 📘 Every PR review → pattern extraction. Every CI failure → error diagnosis. Every self-review → convention learning. Cross-project memory with confidence scoring and decay. v3 roadmap 👀 Outcome-based model routing — Pilot...
Token Efficiency, Not Volume, Defines the ClaudeCode Edge
Everyone has ClaudeCode. The edge is how efficiently you spend tokens, not how much you spend. Agreed?

AI Drafts SOC2 Auth Service, Leaves 35 Issues
Asked Opus 4.6 to design an SOC2‑compliant auth service from zero. It came back with 35 issues. Pilot’s job now is to deliver them. Estimated cost: ~$4. Estimated time: ~1 hour + ~10 minutes of cleanup. --- Devs only have jobs until I get better...

ClaudeCode and Pilot: 2026’s Top AI Workspace
The best AI workspace in 2026? ClaudeCode + Pilot – AI automated delivery pipeline 🤌 https://pilot.quantflow.studio

Self‑review, Quality Gates, and Auto‑fix Loop Proven Effective
Anthropic's new research is out, and a few of my hypotheses just got confirmed. 1. Self‑review and quality gates matter. When users get less critical with polished outputs, automated verification layers compensate for that human tendency. 2. The iteration finding also...

Solo Dev Delivers 200+ Features in 3 Weeks
When the platform catches up to your product, you're building in the right direction. Anthropic just announced auto-merge, CI monitoring, and code review for Claude Code. Pilot has had this since day one — shipped 3 weeks ago. But we didn't stop there: -...
Pilot v2.0 Launches Native Desktop App and Community
Two things shipping today. 🎉 Pilot v2.0.0 → Native desktop app — macOS, Windows, Linux. → Deployment pipelines — dev/stage/prod/custom. → 3 execution backends — Claude Code, OpenCode, Qwen Code. → 200+ features. Self-hosted. Open source. Download: github.com/alekspetrov/pilot/releases/tag/v2.0.0 (docs are coming, GitLab is down) 💬 Pilot Discord → Launching...

CI Turbulence Survived; Add a Fasten‑seat‑belt Alert
Pilot hit some CI turbulence and was fighting a stall 😁 Didn’t crash, didn’t stop — update shipped after a full recovery. Definitely need a “fasten seat belts” light for that phase, thought it was just stuck circling.

Pilot 1.40 Cuts Costs, Adds Smart Model Routing
Pilot v1.40.0 delivered 📦 > Sonnet 4.6 default for simple/medium tasks > 40% cost drop on most executions > Opus 4.6 reserved for complex work > Haiku stays on classifiers > near-Opus quality — preferred 59% over Opus 4.5 > smart routing: complexity detected, model matched model_routing.enabled:...

Pilot Automates Your Roadmap: 133 Features in Two Weeks
Pilot v1.0.0 shipped 🎉 133 features. Built in 2 weeks. The last 22 issues of the v1.0 roadmap were executed by Pilot itself — decomposing epics, creating branches, running CI, merging PRs. → Label a ticket "pilot". Get a PR back. GitHub,...