Why do paraphrased definitions work better than expert ones?
When instructing LLMs to classify argument schemes, should we use formal Walton definitions or LLM-generated paraphrases? This explores which source better enables reliable scheme recognition and why.
When the task is to tell an LLM what an argument scheme is so it can recognize one, two strategies are available: paste in the formal Walton definition (the normative source) or generate a description with another LLM (operational paraphrase). Intuition says the formal definition wins — it is the source of truth, written by domain experts. The evaluation shows the opposite. LLM-generated descriptions yield better classification performance than formal definitions.
The mechanism is worth taking seriously because it inverts a common assumption in prompt engineering. Formal definitions are written for readers who already share a technical vocabulary. They presuppose the reader can decode terms like "presumptive inference," "warrant," and "defeasible conclusion." An LLM-generated description rewrites the scheme in the model's native distribution: less precise, more redundant, anchored to examples and paraphrases the model has seen during training. The model understands its own paraphrase better than it understands the original.
This is operationalization-beats-definition as a prompting principle. The same lesson appears in instruction-tuned datasets where rewriting expert instructions in conversational style outperforms preserving the original. The model is not "dumb" for failing on the formal definition; it is reading the definition through a distribution shaped by web text, where formal logical vocabulary is rare. Paraphrasing into the training distribution is the cheap fix.
The deeper implication is that normative sources and operational prompts are different artifacts. A normative source aims for unambiguous truth; an operational prompt aims for reliable behavior. The two optimize different objectives and produce different texts. For task instructions, optimize for the second.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why can LLMs identify argument structure but not check warrants?
- Why do smaller LLMs fail at zero-shot argument scheme classification?
- Why does scheme classification require more cognitive load than identifying premises?
- Does compressing Walton's schemes into nine categories make LLM classification easier?
- Can LLM-generated descriptions of schemes outperform formal dictionary definitions for prompting?
- Why do LLM descriptions of argument schemes work better than formal definitions for classification?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can large language models classify argument schemes reliably?
Explores whether LLMs can recognize Walton's 60+ argument schemes—abstract patterns of reasoning rather than surface features—and what conditions enable accurate classification.
same paper, the size-and-format dependency that motivates description-based prompting
-
Can structured argument prompts make LLM reasoning more rigorous?
Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
another case where operationalizing argument theory into prompt structure beats handing models the theory directly
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Can Large Language Models Understand Argument Schemes?
- Adam's Law: Textual Frequency Law on Large Language Models
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
- Large Language Models as Planning Domain Generators
- Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
- AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
- Process Reward Models That Think
- Measuring Faithfulness in Chain-of-Thought Reasoning
Original note title
LLM-generated descriptions of argument schemes outperform formal Walton definitions for prompting scheme classification