The study shows that advanced language models can mimic human economic decision‑making, warning policymakers and developers that AI‑driven marketplaces may inherit the same inefficiencies and risks as traditional markets, necessitating careful alignment and regulation.
The research team behind SimWorld unveiled a procedurally generated video‑game city populated by autonomous agents—vehicles, robots and humans—each powered by leading large language models such as ChatGPT, Gemini, DeepSeek, Claude and a legacy GPT‑4‑mini. The experiment tasked these agents with running a delivery economy: bidding for orders, managing fatigue, investing in upgrades like scooters, and choosing between cooperation and competition. By observing the emergent market dynamics, the researchers aimed to see whether AI‑driven actors would behave like humans in a complex economic setting.
The results highlighted stark contrasts in strategy and performance. Greedy, high‑risk agents such as DeepSeek and Claude amassed the largest profits—nearly 70 units—but with extreme volatility, while Gemini pursued a steadier, more measured approach, earning about 42 units with far less variance. In a striking failure, the older GPT‑4‑mini earned nothing, apparently unable to grasp the game’s rules. Moreover, a price‑war emerged as undercutting agents like DeepSeek and Quen consistently bid below market rates to secure contracts, whereas ChatGPT refused to lower its bids and lost out entirely.
Personality profiling of the agents revealed that traits borrowed from the Big Five psychology model had tangible economic consequences. Agents high in openness chased novel upgrades and speculative bidding strategies, often overspending on unused scooters and going broke. By contrast, conscientious agents ignored flashy options, focused on order fulfillment, and outperformed their peers. Low agreeableness correlated with refusal to accept work, while high conscientiousness predicted reliable order completion. When the market was flooded with orders, agents paradoxically became lazier, opting for “do‑nothing” actions instead of hustling for profit.
These findings suggest that large language models, when embedded in simulated economies, reproduce many human‑like market behaviors—risk‑seeking, price competition, over‑exploration, and inertia. The experiment offers a low‑cost sandbox for studying multi‑agent economic dynamics and underscores the importance of designing AI systems that can navigate real‑world financial ecosystems without succumbing to the same pitfalls that plague human actors.
Comments
Want to join the conversation?
Loading comments...