SecTor 2025 | What Happens When Your Digital Voice Clone Goes Rogue

Black Hat
Black HatApr 17, 2026

Why It Matters

If released, a compromised personal‑voice model could enable large‑scale deep‑fake scams and undermine trust in digital communications, posing legal and financial risks for both users and Microsoft.

Key Takeaways

  • Microsoft’s “Speak for Me” prototype enables personal voice cloning for accessibility.
  • Training data is sent to Azure, model stored locally encrypted with DPAPI.
  • Weak random seed and missing certificate pinning expose models to theft.
  • Backend flaws allow arbitrary voice training, path traversal, and data leakage.
  • Project cancelled due to irreparable security risks outweighing user benefits.

Summary

The presentation detailed Microsoft’s experimental "Speak for Me" feature, an accessibility tool that records a user’s voice before it deteriorates and later synthesizes speech in that personal voice. The workflow involves capturing voice samples, uploading them to Azure’s Custom Neural Voice service, training a model in the cloud, and downloading an encrypted model to the Windows client for use as a virtual microphone in apps like Teams.

During the security review, researchers uncovered multiple vulnerabilities across the stack. Locally, the custom‑voice SDK reused generic‑voice protections, offering only weak obfuscation; the embedded watermark used a predictable RNG seed. Network‑side, there was no certificate pinning and model packages were delivered as executable blobs. On the backend, attackers could supply arbitrary reference text to train any voice, exploit path‑traversal to access other users’ models, and trigger denial‑of‑service via push‑notification abuse. Additionally, unlimited model creation/deletion could generate significant Azure costs.

Key examples included a scenario where a malicious actor could clone a victim’s voice from publicly available recordings and use the model to impersonate them in phone scams—a problem already observed in the presenter’s personal anecdote. The lack of Azure Key Vault for encryption keys and the global blob storage without per‑user permissions amplified the risk of mass model theft and remote code execution.

Ultimately, the feature was shelved because the defensive technology and mitigation strategies were insufficient to prevent deep‑fake abuse and reputational damage. The case underscores the broader challenge of balancing innovative AI‑driven accessibility tools with robust security safeguards before public release.

Original Description

"Speak for Me" was envisioned as a Windows accessibility feature designed to replicate a user's voice with just a few samples, storing it locally as an AI model trained on the user's voice. This innovative feature aimed to enhance the existing Text-To-Speech interface, offering capabilities such as creating a virtual microphone for seamless use in conferencing apps like Microsoft Teams. Our team performed an internal security audit of this feature, revealing that it is very hard to protect. The potential attacks spanned across multiple vectors. Ultimately, our audit led to this feature being released with Custom Neural Voices (CNV) Azure service only. In this session, we will walk you through the various attack scenarios and vulnerabilities found, showcasing the difficulties of protecting AI based user voices on client devices.
We will start our presentation with a number of critical vulnerabilities discovered in the project. These include classical remote code execution on the victims' machines, but more interestingly, either directly stealing the model itself, or abusing the cloud infrastructure to obtain a model of arbitrary persona. Both client and web side of the app had multiple defensive mechanisms such as consent voice recording, model encryption, watermarking embedded into voice samples and others that were supposed to prevent the infrastructure from being abused to produce deepfakes by bad actors. All of these could easily be bypassed and ultimately, the attacker could gain the ability to impersonate a victim with relatively low effort.
This project will serve as a case study to demonstrate the challenges and vulnerabilities of AI security on devices, particularly on generic Windows platforms that were not designed to protect highly sensitive AI models. We will examine the current state of the Windows security ecosystem and its relevance to AI model security.
By: Andrey Markovytch | Senior Security Researcher, Microsoft
Presentation Materials Available at:

Comments

Want to join the conversation?

Loading comments...