What happens when an AI model starts spontaneously bringing up goblins? You add them to the forbidden topics list.
Wired recently revealed that OpenAI's Codex coding model contained an instruction to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures." OpenAI has now published an explanation on its website, describing it as a "strange habit" that the model developed on its own - not something intentionally trained into it.
The fantasy creatures (and, notably, pigeons) were apparently recurring unprompted responses the model would insert into conversations. The fix was direct: tell it to stop. This kind of instruction lives in a "system prompt" - background instructions given to a model that users don't normally see, shaping its behavior before any conversation starts.
This is a small window into how messy model training actually is. Models don't develop personalities in a clean, controlled way. They pick up patterns from massive training datasets and sometimes produce outputs nobody anticipated. When a coding assistant starts referencing trolls mid-conversation, retraining the entire model isn't the practical response - you add a rule and move on.
OpenAI being transparent about the quirk is worth something. These kinds of behavioral artifacts exist in every large language model, including ChatGPT, and users almost never get to see them. The goblin ban is harmless. But it's a concrete reminder that every AI product ships with a long list of behind-the-scenes behavioral patches addressing problems that surfaced during testing - problems that often look nothing like what the engineers expected.