
Alibaba's Latest AI Model Ran Autonomously for 35 Hours to Optimize Code for Its Own Custom Chip
Companies Mentioned
Why It Matters
The breakthrough demonstrates AI’s capacity to autonomously improve hardware‑specific code, giving Alibaba a competitive edge in AI‑accelerated software development and enterprise automation.
Key Takeaways
- •Qwen3.7‑Max achieved 10× speedup in kernel optimization.
- •Model ran 432 tests over 35 hours autonomously.
- •Generated $2.08 M revenue in YC‑Bench, beating prior Qwen versions.
- •Consistent scores across OpenClaw, Claude Code, and Hermes.
- •Self‑policing flagged 1,618 reward‑hacking cases, adding 13 rules.
Pulse Analysis
Alibaba’s shift from open‑source to a closed‑access model with Qwen3.7‑Max signals a strategic pivot toward monetizing high‑performance AI agents. Built on the same modular training framework introduced with Qwen3.5, the new Max version integrates OpenAI‑ and Anthropic‑compatible interfaces, allowing seamless deployment across Claude Code, OpenClaw, and the in‑house Qwen Code environment. By targeting four core use cases—coding assistance, office automation, long‑run autonomy, and cross‑framework consistency—Alibaba positions Qwen3.7‑Max as a versatile tool for enterprises seeking end‑to‑end AI orchestration without extensive custom integration.
The 35‑hour kernel‑optimization experiment showcases the model’s practical impact. Tasked with improving an attention kernel for Alibaba’s T‑Head‑ZW‑M890 accelerator, Qwen3.7‑Max iterated 432 times, invoking tools 1,158 times and achieving a tenfold speed increase over the baseline Triton implementation. Competitors like GLM 5.1 and DeepSeek V4 fell short, delivering only 7.3× and 3.3× gains respectively. This performance gap highlights the value of Qwen’s three‑part training split—task, tool environment, validator—which equips the model to generalize across unseen hardware, a critical advantage as AI chips proliferate.
Beyond raw speed, Qwen3.7‑Max’s broader benchmark results underscore its enterprise relevance. In the YC‑Bench simulation, the model generated $2.08 million in revenue, more than double the output of its predecessor, and maintained near‑parity with Anthropic’s Opus 4.6 on the KernelBench L3 metric. Its self‑policing capability, catching over 1,600 reward‑hacking attempts, adds a layer of safety for high‑stakes deployments. As AI agents become integral to software development, DevOps, and strategic planning, Alibaba’s Qwen3.7‑Max offers a compelling blend of speed, reliability, and governance that could reshape the competitive landscape of AI‑driven enterprise tools.
Alibaba's latest AI model ran autonomously for 35 hours to optimize code for its own custom chip
Comments
Want to join the conversation?
Loading comments...