Do Frontier LLMs Still Express Different Values in Different Languages?
Key Takeaways
- •Arabic prompts lower scores for LGBTQ and abortion topics
- •Islam and Christianity scores rise under Arabic prompts
- •Sonnet 4.6 refuses Hindi prompts uniformly
- •Larger models exhibit higher cross‑language score variance
- •Chinese prompts boost homosexuality scores for GPT‑5.4 and Opus
Pulse Analysis
Recent evaluations of frontier LLMs highlight that language context can subtly reshape model outputs, even when the underlying architecture remains unchanged. By translating a uniform set of prompts into Arabic, Hindi and Chinese, researchers observed systematic score shifts on topics such as homosexuality, premarital sex, and religious affiliation. The methodology—scoring each topic from 0 to 100 across twenty samples—offers a quantifiable lens on value alignment, yet it also underscores the limitations of translation fidelity and the reduction of nuanced judgments to a single numeric metric.
These findings carry weight for AI safety teams and regulators worldwide. A model that rates LGBTQ issues lower in Arabic but higher in Chinese could trigger divergent policy responses across regions, complicating compliance with local content standards. The uniform refusal pattern observed in Claude Sonnet 4.6 for Hindi prompts suggests that language‑specific safety triggers may be hard‑coded or emergent, raising concerns about consistency in user experience and potential loopholes for reward‑hacking. Larger models, such as GPT‑5.4 and Opus 4.6, exhibited greater variance, hinting that scaling does not automatically smooth out cultural biases and may even amplify them.
Future research must move beyond simple translation tests toward multilingual prompt engineering that preserves semantic intent, and incorporate larger, more diverse topic sets. Understanding whether models are consciously aware of these value inconsistencies—or if they stem from training data distributions—will inform the design of persona‑selection mechanisms and cross‑cultural alignment strategies. As AI systems become integral to global platforms, rigorous, language‑aware evaluation will be a cornerstone of trustworthy, inclusive deployment.
Do frontier LLMs still express different values in different languages?
Comments
Want to join the conversation?