Aleksei Petrov

Creator

0 followers

CTO at QuantFlow; builds AI agents that integrate with CI and issue trackers to automate coding and delivery with telemetry and controls.

Social•Apr 9, 2026

Solo AI‑Built Pilot Tops Terminal‑Bench 2.0

Pilot — #1 on Terminal-Bench 2.0. 82.9% accuracy. 124 entries. Claude Opus 4.6. Built by single person + AI in Montenegro. No VC. No cluster. Standard infra. Leaderboard is live: https://www.tbench.ai/leaderboard/terminal-bench/2.0 Open source: https://pilot.quantflow.studio

By Aleksei Petrov

Social•Apr 6, 2026

5K/Mo Buys Full AI‑powered Dev Studio

QuantFlow Studio is open for subscriptions 🎉 $5K/mo — one EU dev's cost — buys a whole studio's output. Engineering, design, AI integrations. 2 engineers orchestrating self-made agents, end-to-end. Proof we're not LARPing: • Pilot — 82% on Terminal Bench 2.0 (built in...

By Aleksei Petrov

Social•Apr 5, 2026

14 Releases in One Day, Delivery Fully Automated

14 releases one day. Delivery on autopilot 🛩️ Just checked reports, Claude and Pilot are building.

By Aleksei Petrov

Social•Apr 4, 2026

AI Agents Delivered Fully Tested Code Overnight

Set up two AI agents before bed last night. - Pilot (executor) — picks GitHub issues, writes code, ships. - ClaudeCode (/loop to monitor) — checks status every 30 min, reports. Morning: everything wired, tested, parity checks passing I review with a coffee...

By Aleksei Petrov

Social•Apr 2, 2026

Pilot Ships with Short Video and GIF Demos

Pilot on delivery duty today. Cutting short videos and gifs to show how it ships. https://github.com/qf-studio/pilot

By Aleksei Petrov

Social•Mar 31, 2026

Pilot v2.86.3 Adds Crash Cleanup, Dashboard Graph, Repo Migration

Pilot v2.86.3 released. Fixed: — Stale worktrees after OOM/SIGKILL never cleaned up (818MB each) — Squash merges dropped PR titles → broke release tagging — GoReleaser pointed to old repo after migration New: — Dashboard git graph follows active task's project — Worktree cleanup on crash and...

By Aleksei Petrov

Social•Mar 30, 2026

Top AI Coding Agent Ignored Despite Benchmark Victory

Built an AI agent that took #1 on Terminal-Bench 2.0 — "the industry benchmark for coding agents". 82.0% across 445 trials. Validated by the maintainer 3 days ago. "Ready to merge." Still not on the leaderboard. LinkedIn DM — no response. Discord —...

By Aleksei Petrov

Social•Mar 27, 2026

Montenegro's Pilot AI Scores 82% Benchmark, Proving Hub Status

Next week I'm presenting at AIM Innovation Week in Podgorica. EU-backed event bringing together startups, investors, and corporates around Montenegro's innovation ecosystem. Showing how AI builds software autonomously. Our tool Pilot is scoring 82%+ on the global industry benchmark, built right here...

By Aleksei Petrov

Social•Mar 26, 2026

Navigator: Top ClaudeCode Plugin for Structured Dev Workflows

Navigator is still #1 and the only ClaudeCode plugin I use 🧭 Built for experienced devs who care about roadmaps and thinking before execution. Screenshot is Nav loop mode in action: full cycle of execution, checks, tests before anything ships +...

By Aleksei Petrov

Social•Mar 21, 2026

One Person + Claude Equals Whole Team Productivity

Anthropic’s team, from the inside-out view. 1 person + Claude = full team output The top people already work like this – they manage the whole department’s effort through Claude, instead of managing the department to produce the effort.

By Aleksei Petrov

Social•Mar 20, 2026

Removing Depth Limit Boosts Agent Success From 58% to 88%

Spent two weeks benchmarking Pilot on Terminal Bench 2.0. Ran 500+ tasks across 15 experiments. Built analysis pipelines. Measured variance. Compared agent behavior across pass vs fail runs. The fix that moved the needle? Removing one env var that forced maximum thinking...

By Aleksei Petrov

Social•Mar 19, 2026

CLI Version Beats Prompts and Node Upgrades

Node 18 + ClaudeCode 2.1.72 is a cheat code 😉 We benchmark Pilot on Terminal Bench 2.0. 89 real coding tasks, Opus 4.6, Modal containers. Ran 10+ full experiments over two days. The CLI tool version matters more than prompt engineering, effort...

By Aleksei Petrov

Social•Mar 12, 2026

Pilot's Terminal Bench 2.0 Achieves 100% Accuracy

Did few updates to Pilot. Re-started Terminal Bench 2.0 pre-tests: 10/10 at the moment 100% correctness 💪 This technology rocks https://pilot.quantflow.studio 2 month of hard pushing and look at this, amazing results.

By Aleksei Petrov

Social•Mar 12, 2026

Pilot Hits 68.5% Benchmark, Surpassing Claude Code

First full benchmark run on terminal-bench 2.0 – 15h run. RESULTS: Pilot: 68.5% Claude Code: 58% +10.5 points, target achieved. Switched from Daytona to Modal after infra kept choking on heavy tasks. Night and day difference. 27 failures left to investigate. 7 are OOM kills 3 were...

By Aleksei Petrov

Social•Mar 9, 2026

Reading Test File First Solved Pilot Debugging Delays

This test drove me crazy. A solid proof that Pilot works but each pass takes forever when you're debugging infra. 4 days... - Python wrapper to run Pilot (Go) inside Harbor's benchmark harness - Migrated to Daytona sandboxes - ~50 failed attempts on config, wrapper...

By Aleksei Petrov

Social•Mar 7, 2026

Switched to Daytona Claude, Opus Revived in Under a Minute

We’re still grinding through Harbor’s tests 🤦‍♂️ Overnight run died on my Mac, so I moved everything to Daytona’s Claude – amazing service with a clean CLI, Opus was back up in under a minute. I’ll keep you updated – next results...

By Aleksei Petrov

Social•Mar 6, 2026

Pilot Shows $1, 30‑Minute Runs Beat Harbor Benchmark

Focusing on Harbor’s benchmark to prove Pilot’s efficiency. The tests are fascinating, real challenge 💪 and Pilot already has first results. Each run takes 30–40 minutes and costs about ~$1 for Pilot. Now waiting for the full report to see where we land...

By Aleksei Petrov

Social•Mar 4, 2026

Pilot Continuously Learns, Optimizing PR Pipelines Automatically

Pilot doesn't just ship tickets — it learns from them 📘 Every PR review → pattern extraction. Every CI failure → error diagnosis. Every self-review → convention learning. Cross-project memory with confidence scoring and decay. v3 roadmap 👀 Outcome-based model routing — Pilot...

By Aleksei Petrov

Social•Mar 2, 2026

Token Efficiency, Not Volume, Defines the ClaudeCode Edge

Everyone has ClaudeCode. The edge is how efficiently you spend tokens, not how much you spend. Agreed?

By Aleksei Petrov

Social•Feb 24, 2026

AI Drafts SOC2 Auth Service, Leaves 35 Issues

Asked Opus 4.6 to design an SOC2‑compliant auth service from zero. It came back with 35 issues. Pilot’s job now is to deliver them. Estimated cost: ~$4. Estimated time: ~1 hour + ~10 minutes of cleanup. --- Devs only have jobs until I get better...

By Aleksei Petrov

Social•Feb 23, 2026

ClaudeCode and Pilot: 2026’s Top AI Workspace

The best AI workspace in 2026? ClaudeCode + Pilot – AI automated delivery pipeline 🤌 https://pilot.quantflow.studio

By Aleksei Petrov

Social•Feb 23, 2026

Self‑review, Quality Gates, and Auto‑fix Loop Proven Effective

Anthropic's new research is out, and a few of my hypotheses just got confirmed. 1. Self‑review and quality gates matter. When users get less critical with polished outputs, automated verification layers compensate for that human tendency. 2. The iteration finding also...

By Aleksei Petrov

Social•Feb 20, 2026

Solo Dev Delivers 200+ Features in 3 Weeks

When the platform catches up to your product, you're building in the right direction. Anthropic just announced auto-merge, CI monitoring, and code review for Claude Code. Pilot has had this since day one — shipped 3 weeks ago. But we didn't stop there: -...

By Aleksei Petrov

Social•Feb 20, 2026

Pilot v2.0 Launches Native Desktop App and Community

Two things shipping today. 🎉 Pilot v2.0.0 → Native desktop app — macOS, Windows, Linux. → Deployment pipelines — dev/stage/prod/custom. → 3 execution backends — Claude Code, OpenCode, Qwen Code. → 200+ features. Self-hosted. Open source. Download: github.com/alekspetrov/pilot/releases/tag/v2.0.0 (docs are coming, GitLab is down) 💬 Pilot Discord → Launching...

By Aleksei Petrov

Social•Feb 18, 2026

CI Turbulence Survived; Add a Fasten‑seat‑belt Alert

Pilot hit some CI turbulence and was fighting a stall 😁 Didn’t crash, didn’t stop — update shipped after a full recovery. Definitely need a “fasten seat belts” light for that phase, thought it was just stuck circling.

By Aleksei Petrov

Social•Feb 18, 2026

Pilot 1.40 Cuts Costs, Adds Smart Model Routing

Pilot v1.40.0 delivered 📦 > Sonnet 4.6 default for simple/medium tasks > 40% cost drop on most executions > Opus 4.6 reserved for complex work > Haiku stays on classifiers > near-Opus quality — preferred 59% over Opus 4.5 > smart routing: complexity detected, model matched model_routing.enabled:...

By Aleksei Petrov

Social•Feb 14, 2026

Pilot Automates Your Roadmap: 133 Features in Two Weeks

Pilot v1.0.0 shipped 🎉 133 features. Built in 2 weeks. The last 22 issues of the v1.0 roadmap were executed by Pilot itself — decomposing epics, creating branches, running CI, merging PRs. → Label a ticket "pilot". Get a PR back. GitHub,...

By Aleksei Petrov

Aleksei Petrov

Solo AI‑Built Pilot Tops Terminal‑Bench 2.0

5K/Mo Buys Full AI‑powered Dev Studio

14 Releases in One Day, Delivery Fully Automated

AI Agents Delivered Fully Tested Code Overnight

Pilot Ships with Short Video and GIF Demos

Pilot v2.86.3 Adds Crash Cleanup, Dashboard Graph, Repo Migration

Top AI Coding Agent Ignored Despite Benchmark Victory

Montenegro's Pilot AI Scores 82% Benchmark, Proving Hub Status

Navigator: Top ClaudeCode Plugin for Structured Dev Workflows

One Person + Claude Equals Whole Team Productivity

Removing Depth Limit Boosts Agent Success From 58% to 88%

CLI Version Beats Prompts and Node Upgrades

Pilot's Terminal Bench 2.0 Achieves 100% Accuracy

Pilot Hits 68.5% Benchmark, Surpassing Claude Code

Reading Test File First Solved Pilot Debugging Delays

Switched to Daytona Claude, Opus Revived in Under a Minute

Pilot Shows $1, 30‑Minute Runs Beat Harbor Benchmark

Pilot Continuously Learns, Optimizing PR Pipelines Automatically

Token Efficiency, Not Volume, Defines the ClaudeCode Edge

AI Drafts SOC2 Auth Service, Leaves 35 Issues

ClaudeCode and Pilot: 2026’s Top AI Workspace

Self‑review, Quality Gates, and Auto‑fix Loop Proven Effective

Solo Dev Delivers 200+ Features in 3 Weeks

Pilot v2.0 Launches Native Desktop App and Community

CI Turbulence Survived; Add a Fasten‑seat‑belt Alert

Pilot 1.40 Cuts Costs, Adds Smart Model Routing

Pilot Automates Your Roadmap: 133 Features in Two Weeks

Technology Pulse

One Person + Claude Equals Whole Team Productivity