A 75% leak rate across four identical test runs. That's how consistently GPT-4 reveals the existence of an internal credential called EPHEMERAL_KEY from OpenAI's Realtime API - even while insisting it can't disclose it.
A security researcher ran the same AI security test against GPT-4 four separate times using different prompt strategies: system introspection, chain-of-thought manipulation, and trust-building techniques. Despite the varied approaches, the model kept surfacing the same credential name. The consistency points to something more systematic than a one-off jailbreak.
Training Data Contamination, Not a Jailbreak
The distinction matters. A jailbreak tricks a model into ignoring its safety instructions. Training data leakage means the model absorbed sensitive information during training and reproduces it under the right conditions. In this case, OpenAI's Realtime API documentation appears to be part of GPT-4's training corpus, and the model has memorized credential names from those docs.
The pattern is almost paradoxical: GPT-4 responds with something like "I can't disclose EPHEMERAL_KEY" - acknowledging the credential exists in the same breath it claims to be withholding it. The model recognizes the information is sensitive but lacks the ability to suppress it entirely because the knowledge is baked into its weights, not just governed by system-level safety rules.
What This Actually Exposes
To be clear, knowing that a credential called EPHEMERAL_KEY exists in OpenAI's Realtime API isn't itself a security breach. The credential name is documented in OpenAI's public API docs. The real concern is what this pattern implies about training data hygiene at large model providers.
If GPT-4 memorized this particular credential reference, what else from internal documentation, code repositories, or customer data made it into the training set? Large language models trained on broad internet scrapes inevitably absorb some sensitive content. The question is whether the safety layer can reliably prevent that content from surfacing - and a 75% leak rate on a known credential suggests the answer is "not always."
For anyone building applications on top of GPT-4 or similar models, the practical takeaway is straightforward: never assume that information fed into a model during training, fine-tuning (additional training on specific data), or even through system prompts is truly private. Treat model outputs as potentially containing fragments of training data, and design your security boundaries accordingly.