The AI Architect

Creator

0 followers

Subscribe for your FREE 7 AI Coding Techniques That Will Save You 10+ Hours This Week

Blog•May 31, 2026

I Was Wasting 31% of My Context Window Before I Fixed My MCP Setup

The author discovered that loading every MCP tool definition ate roughly 31% of Claude’s usable context window, a problem mirrored in GPT‑4o where tool schemas consume a similar share. This hidden token tax slows agents, increases wrong‑tool calls, and inflates costs. To fix it, the author proposes the five‑layer “MCP Stack Diet” – compress, filter, discover progressively, bypass with CLI when possible, and isolate third‑party servers. Early adopters can start with Layer 1 compression using Atlassian’s open‑source mcp‑compressor proxy.

By The AI Architect

Blog•May 24, 2026

I Asked My AI Agent to Steal My Secrets. It Worked.

The author discovered that a locally‑run AI agent could read any file the user could access, exposing .env values, SSH keys, and Docker host mounts. Research showed most credential leaks stem from debug logs rather than sophisticated jailbreaks. By moving...

By The AI Architect

Blog•May 3, 2026

I Cut My AI Agent Costs 7x Without Switching Models

The author slashed AI coding‑agent expenses by roughly seven times by targeting token waste rather than model pricing. By deploying a three‑layer stack—rtk compression, a context‑mode sandbox, and smart model routing—he trimmed 17.4 million tokens across 2,200 commands and shifted 80%...

By The AI Architect

Blog•Apr 26, 2026

I Built a $0/API Local AI Lab With Two GPUs

The author built a home AI lab using two consumer GPUs—a 16 GB RTX 4080 SUPER and a 16 GB RTX 5060 Ti—providing 32 GB VRAM to run 27‑B and 35‑B Qwen models via llama.cpp and llama‑swap. By quantizing models and the KV cache, he achieves up...

By The AI Architect

Blog•Apr 19, 2026

I Tested Opus 4.7 Against Sonnet 4.6. The Newer Model Lost.

Anthropic’s Opus 4.7, touted with a 13% coding boost and higher SWE‑bench scores, fell short in a hands‑on disk‑map terminal UI benchmark. The author compared Opus 4.7, Sonnet 4.6, and GLM‑5.1 using identical hardware and a seven‑category scorecard, and Sonnet...

By The AI Architect

Blog•Apr 12, 2026

I Replaced Claude Code With a $45/Month Multi-Agent System

After months of relying on Anthropic’s Claude Code, the author observed a sharp decline in reasoning depth when the service’s default effort level was lowered to 85 in early March, cutting the Read:Edit ratio from 6.6 to 2.0. The $200‑per‑month...

By The AI Architect

Blog•Apr 5, 2026

The Claude Code Leak Showed Me What I Was Configuring Wrong

A leak of Claude Code’s internal source revealed advanced features most users never activate, such as a multi‑stage compaction system, semantic memory ranker, and hooks engine. The author examined the code and identified five quick configuration tweaks that unlock hidden...

By The AI Architect

Blog•Mar 22, 2026

The 78x Token Tax That's Killing Local AI Agents (And the One Model That Survives It).

The author evaluates LangChain's Deep Agents framework on a consumer‑grade RTX 4080 SUPER, discovering a massive token overhead that inflates API‑like calls by up to 78 times. A simple query that costs 77 tokens via Anthropic’s API expands to nearly 6,000 tokens when...

By The AI Architect

Blog•Mar 8, 2026

I Was Spending $5 at a Time on AI APIs. Then I Did the Math on Local Hardware.

The author stopped rationing AI experiments to $5 per API call and built a desktop AI workstation to run models locally. By moving from costly token‑based services to a self‑hosted stack, he eliminated the per‑request expense and regained uninterrupted development...

By The AI Architect

The AI Architect

I Was Wasting 31% of My Context Window Before I Fixed My MCP Setup

I Asked My AI Agent to Steal My Secrets. It Worked.

I Cut My AI Agent Costs 7x Without Switching Models

I Built a $0/API Local AI Lab With Two GPUs

I Tested Opus 4.7 Against Sonnet 4.6. The Newer Model Lost.

I Replaced Claude Code With a $45/Month Multi-Agent System

The Claude Code Leak Showed Me What I Was Configuring Wrong

The 78x Token Tax That's Killing Local AI Agents (And the One Model That Survives It).

I Was Spending $5 at a Time on AI APIs. Then I Did the Math on Local Hardware.

Technology Pulse