What makes colorless green ideas fail where Jabberwocky succeeds?
This reads the two famous nonsense sentences as a contrast in *where* meaning lives: Carroll's Jabberwocky invents words but keeps real structure, while Chomsky's 'colorless green ideas sleep furiously' keeps real words but collides their meanings — and the corpus has a lot to say about why form-without-content can succeed where content-without-coherence fails.
This explores a real linguistic puzzle through the lens of how meaning actually gets made: Jabberwocky ('twas brillig, and the slithy toves…') is built from invented words, yet readers feel they understand it; 'colorless green ideas sleep furiously' is built from ordinary words, yet feels like noise. The collection's most direct answer is that meaning is not assembled bottom-up from word-referents — it's triggered top-down by structure. How do nonsense words create meaning without referents? argues that Jabberwocky achieves 'sense-of-nonsense' purely through frame-activation on syntactic and prosodic cues: 'slithy toves' slots into a noun phrase, the rhythm tells you it's a story, and a frame lights up even though no word points at anything. Meaning-making, on this view, doesn't require referential content at all.
That reframes the failure of 'colorless green ideas.' Its words *do* have referents — and that's the problem. The referents actively contradict each other (colorless yet green, ideas that sleep, sleeping done furiously), so the semantic content fights the frame instead of feeding it. Jabberwocky has no referents to misbehave; its blanks get filled cooperatively by the structure. Chomsky's sentence has referents that refuse to cohere, jamming the very frame-resonance that carries Jabberwocky. Same grammar, opposite outcomes — because one supplies an empty, fillable frame and the other supplies a frame pre-loaded with conflict.
The surprising payoff is that this is the *same* pattern the corpus keeps finding inside language models. Does logical validity actually drive chain-of-thought gains? shows that chain-of-thought exemplars that are logically *invalid* perform almost as well as valid ones — the model is responding to the form of reasoning, not its content, exactly as a reader responds to Jabberwocky's form over its (absent) sense. Why does chain-of-thought reasoning fail in predictable ways? generalizes this: chain-of-thought is 'constrained imitation, not abstract inference,' where structural coherence matters more than content correctness. A model riding the shape of a reasoning trace is doing what your brain does when it parses 'mome raths outgrabe.'
If form is doing the work, then the way to *break* it is to break the structure's ability to organize — which is what 'colorless green ideas' does semantically. Why do some questions perform better without step-by-step reasoning? makes the mechanism concrete: reasoning succeeds only when the relevant information actually flows through the prompt structure first; when it fails to aggregate into the structure, performance collapses. Content that won't integrate into the frame is dead weight, whether it's a contradictory adjective or a question whose semantics never reach the reasoning step. The frame is the engine; ungovernable content stalls it.
So the one-line answer with a twist: Jabberwocky succeeds because empty slots let structure summon meaning, and 'colorless green ideas' fails because full-but-incompatible words deny structure anything coherent to summon — and the same primacy of form over referential content is exactly what explains why models can reason persuasively from invalid steps and stumble when content can't be made to fit the frame. For a sharper edge on whether this 'form-first' meaning is a limit or a creative resource, Can LLMs reason creatively beyond conventional problem-solving? is a good next door to open.
Sources 5 notes
Jabberwocky achieves sense-of-nonsense through frame-activation on syntactic and prosodic cues alone, proving meaning-making does not require referential content. This reverses compositional accounts and shows frame-resonance is the primary meaning-making operation.
Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.
CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.
Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.
Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.