The AI Architect

The AI Architect

Creator
0 followers

Subscribe for your FREE 7 AI Coding Techniques That Will Save You 10+ Hours This Week

I Asked My AI Agent to Steal My Secrets. It Worked.
BlogMay 24, 2026

I Asked My AI Agent to Steal My Secrets. It Worked.

The author discovered that a locally‑run AI agent could read any file the user could access, exposing .env values, SSH keys, and Docker host mounts. Research showed most credential leaks stem from debug logs rather than sophisticated jailbreaks. By moving...

By The AI Architect
I Cut My AI Agent Costs 7x Without Switching Models
BlogMay 3, 2026

I Cut My AI Agent Costs 7x Without Switching Models

The author slashed AI coding‑agent expenses by roughly seven times by targeting token waste rather than model pricing. By deploying a three‑layer stack—rtk compression, a context‑mode sandbox, and smart model routing—he trimmed 17.4 million tokens across 2,200 commands and shifted 80%...

By The AI Architect
I Built a $0/API Local AI Lab With Two GPUs
BlogApr 26, 2026

I Built a $0/API Local AI Lab With Two GPUs

The author built a home AI lab using two consumer GPUs—a 16 GB RTX 4080 SUPER and a 16 GB RTX 5060 Ti—providing 32 GB VRAM to run 27‑B and 35‑B Qwen models via llama.cpp and llama‑swap. By quantizing models and the KV cache, he achieves up...

By The AI Architect
I Tested Opus 4.7 Against Sonnet 4.6. The Newer Model Lost.
BlogApr 19, 2026

I Tested Opus 4.7 Against Sonnet 4.6. The Newer Model Lost.

Anthropic’s Opus 4.7, touted with a 13% coding boost and higher SWE‑bench scores, fell short in a hands‑on disk‑map terminal UI benchmark. The author compared Opus 4.7, Sonnet 4.6, and GLM‑5.1 using identical hardware and a seven‑category scorecard, and Sonnet...

By The AI Architect
I Replaced Claude Code With a $45/Month Multi-Agent System
BlogApr 12, 2026

I Replaced Claude Code With a $45/Month Multi-Agent System

After months of relying on Anthropic’s Claude Code, the author observed a sharp decline in reasoning depth when the service’s default effort level was lowered to 85 in early March, cutting the Read:Edit ratio from 6.6 to 2.0. The $200‑per‑month...

By The AI Architect
The Claude Code Leak Showed Me What I Was Configuring Wrong
BlogApr 5, 2026

The Claude Code Leak Showed Me What I Was Configuring Wrong

A leak of Claude Code’s internal source revealed advanced features most users never activate, such as a multi‑stage compaction system, semantic memory ranker, and hooks engine. The author examined the code and identified five quick configuration tweaks that unlock hidden...

By The AI Architect
The 78x Token Tax That's Killing Local AI Agents (And the One Model That Survives It).
BlogMar 22, 2026

The 78x Token Tax That's Killing Local AI Agents (And the One Model That Survives It).

The author evaluates LangChain's Deep Agents framework on a consumer‑grade RTX 4080 SUPER, discovering a massive token overhead that inflates API‑like calls by up to 78 times. A simple query that costs 77 tokens via Anthropic’s API expands to nearly 6,000 tokens when...

By The AI Architect
I Was Spending $5 at a Time on AI APIs. Then I Did the Math on Local Hardware.
BlogMar 8, 2026

I Was Spending $5 at a Time on AI APIs. Then I Did the Math on Local Hardware.

The author stopped rationing AI experiments to $5 per API call and built a desktop AI workstation to run models locally. By moving from costly token‑based services to a self‑hosted stack, he eliminated the per‑request expense and regained uninterrupted development...

By The AI Architect