ChatGPT has a ‘goblin’ obsession. Now we know why

PCWorld

ChatGPT has a ‘goblin’ obsession. Now we know why

I’ve seen some odd AI system instructions in my day, but this one takes the cake: a prompt in OpenAI’s Codex command-line app that demands models “never talk about goblins, gremlins, trolls, ogres, pigeons, or other animals or creatures.” That’s a new one, and word of the head-turning instruction in OpenAI’s powerful GPT-5.5 quickly spread on Reddit, Wired, and elsewhere . So, what gives? Well, it turns out that OpenAI’s latest GPT models, all the way up to the most recent GPT-5.5 flagship, have displayed a clear habit for sprinkling in goblins and other creatures into its replies, both in ChatGPT and the Codex app, OpenAI explained in a blog post . Digging deeper into the quirk, OpenAI engineers noticed that the goblins were more likely to show up in GPT’s “Nerdy” personality, which included the following line among its various instructions: You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Noticing the steadily increasing prevalence of “goblins” from GPT-5.2 to GPT-5.4, OpenAI coders developed a theory: that personality training was, over time, progressively reinforcing the model’s habit of mentioning the little creatures. Even stranger, the OpenAI researchers noticed GPT’s propensity for dropping references to “goblins” and “gremlins” was increasing even when users didn’t use the Nerdy personality. Could the “rewards” the model was getting for its playful “goblins” mentions under the Nerdy persona be spreading into later training sessions? The answer, as it turns out, is yes, and later investigation found goblins, gremlins, and “a whole family of other odd creatures” in GPT-5.5’s supervised fine-tuning data, according to the OpenAI post. OpenAI said it nixed the Nerdy personality back in March, but not before GPT-5.5 had already been trained–hence the addition of the crude, strongly-worded ban on the goblins and gremlins in the Codex CLI system prompt. It’s wild stuff, but it also demonstrates again the strange and often mysterious process of LLM training, where models are engorged with mountains of data and then fine-tuned to behave in a given way. The fine-tuning stage isn’t like a blueprint for a house, where you can determine the precise location of every door and window; instead, it’s more of a rewards-based system that sometimes leads to unexpected consequences. You know, like gremlins.

Go to News Site