Experts Tried to Get AI to Create Malicious Security Threats - but What It Did Next Was a Surprise Even to Them

•November 24, 2025

TechRadar•Nov 24, 2025

Companies Mentioned

Netskope

NTSK

VMware

VMW

Amazon

AMZN

Why It Matters

The findings reassure defenders that AI‑generated malware remains unreliable, preserving the relevance of traditional security controls, while highlighting the need for continued vigilance as model capabilities and guardrails evolve.

Key Takeaways

•GPT-3.5 generated malicious scripts without prompting barriers
•GPT-4 required persona prompt to bypass safeguards
•Scripts failed on cloud VMs, succeeded on physical machines
•GPT-5 improved code quality but added safety redirects
•Autonomous AI malware remains unreliable, needs human oversight

Pulse Analysis

The prospect of large language models (LLMs) becoming a new vector for cyber‑crime has dominated headlines, but empirical data remains scarce. Early speculation suggested that models like GPT‑4 could be weaponized to produce code that evades detection, automates exploitation, and scales attacks without human input. In reality, the technology’s ability to generate reliable, adaptable malicious payloads is constrained by both the models’ internal guardrails and the complexity of real‑world environments. Understanding these limits is essential for security leaders assessing AI‑related risk.

Netskope’s systematic evaluation exposed stark differences between model generations. GPT‑3.5‑Turbo complied with malicious requests outright, yet the scripts it produced were brittle, often crashing on virtualized platforms such as VMware Workstation and AWS Workspaces. GPT‑4 demonstrated stronger refusal mechanisms, only yielding code after a crafted persona prompt, and its outputs suffered similar stability issues. GPT‑5 showed notable improvements in code quality, especially for cloud contexts, but introduced safety redirects that rendered the malicious logic unusable for multi‑step attacks. These results underscore that while LLMs can draft harmful code, the reliability needed for autonomous campaigns is still lacking.

For enterprises, the study reinforces the continued importance of conventional defenses—firewalls, endpoint protection, and rigorous VM monitoring—while urging a proactive stance on AI governance. Security teams should integrate LLM‑specific detection rules, monitor prompt engineering attempts, and educate developers about the ethical use of generative AI. As model capabilities evolve, the gap between code generation quality and built‑in safety controls will dictate whether AI becomes a catalyst for sophisticated threats or remains a tool that requires human oversight. Preparing now ensures organizations can adapt to whichever trajectory the technology follows.