
Report: Massive Amounts of Sensitive Data Being Shared with GenAI Tools
Why It Matters
The unchecked leakage threatens regulatory fines, competitive advantage, and long‑term brand trust, making AI data governance a critical priority for enterprises.
Report: Massive Amounts of Sensitive Data Being Shared with GenAI Tools
By Michael Vizard on January 15, 2026
A report published today by Harmonic Security suggests that the amount of data being shared with generative artificial intelligence (AI) tools is exponentially increasing in ways that will inevitably lead to more security breaches and compliance issues.
An analysis of 22.4 million prompts that were used across six generative AI applications in 2025 finds that data exposures most commonly involve ChatGPT (71%). Additionally, 17 % of all the exposures discovered involved personal or free accounts where organizations have zero visibility, no audit trails, and data may train public models. Of the 98,034 instances involving sensitive data, the vast majority (87 %) occurred via ChatGPT Free, followed by Google Gemini at 5,935 (6 %), Microsoft Copilot at 3,416 (3.5 %), Claude at 2,412 (2.5 %), and Perplexity at 1,245 (1.3 %).
Of the 22.4 million prompts analyzed, 579,000 (2.6 %) contained company‑sensitive data. Code, with 30 % of data exposures, was the leading risk, followed by legal discourse (22.3 %), merger and acquisition data (12.6 %), financial projections (7.8 %), and investment portfolio data (5.5 %).
Michael Marriott, vice president of product marketing for Harmonic Security, said most of the sensitive data discovered was inadvertently shared when some type of unstructured document was exposed to a generative AI model. No personally identifiable information or proprietary file content actually left a customer environment.
However, the analysis only spans six tools. When you consider that there are at least 661 tools that have generative AI capabilities, the amount of potentially sensitive data being shared is significantly greater than many organizations fully appreciate, noted Marriott.
Additionally, a significant percentage of the data is finding its way into repositories that reside in data centers in countries that don’t respect data privacy, noted Marriott. For example, a total of 4 % of usage of generative AI tools that Harmonic Security was able to track involved applications that were storing data in China. Many of the applications being used don’t necessarily make it apparent where the data they collect is being stored, noted Marriott.
The challenge cybersecurity teams face is that the data being shared with a generative AI tool might not lead to a breach being known for months or years. Some providers of these tools are using the data collected to train the next generation of their AI models unless an end user has specifically opted out of allowing them to use their data for that purpose. As such, it’s probable that much of that sensitive data will one day show up in output generated by an AI model, creating a breach that might lead to significant fines being levied.
Ideally, more organizations would rely on commercial versions of these tools that have guardrails in place to help prevent sensitive data from being shared. Even then, many of those guardrails might be bypassed, so there is also a need to monitor how employees are using these tools to ensure that the number of data‑leakage incidents is minimized.
After all, the only thing worse than the actual breach itself is knowing that data is readily accessible to anyone who cares to craft a simple prompt to retrieve it anytime they like.
Comments
Want to join the conversation?
Loading comments...