SYNTHESIS NOTE
Psychology, Society, and Alignment Conversational AI and Personalization Reasoning, Retrieval, and Evaluation

Can we measure prompt quality independent of model outputs?

This explores whether prompt quality has measurable, learnable dimensions beyond intuition. The research asks if prompts can be evaluated by their communicative, cognitive, and instructional properties rather than by their results.

Synthesis note · 2026-03-28 · sourced from Prompts Prompting
How do you build domain expertise into general AI models? Why can't AI models lead conversations on their own?

"What Makes a Good Natural Language Prompt?" (Long et al., 2025) introduces the first systematic framework for evaluating prompt quality independent of model performance. Rather than measuring prompts by their outputs, the framework measures prompts by their communicative, cognitive, and instructional properties — treating prompt quality as a human-facing design problem.

The six dimensions:

Communication (from Grice's Maxims): token quantity (optimal information density), manner (clarity and directness), interaction and engagement (encouraging clarification), politeness (respectful tone — impolite prompts measurably degrade performance across tasks and languages).

Cognition (from Cognitive Load Theory): manage intrinsic load (break complex tasks into steps aligned with LM capabilities), reduce extraneous load (minimize unnecessary complexity and redundancy), encourage germane load (engage the model's prior knowledge and deep working memory).

Instruction (from Gagné's Nine Events): objectives (explicit task specification), external tools (guiding when to use external resources), metacognition (self-monitoring and self-verification), demonstrations (examples and counterexamples), rewards (feedback mechanisms).

Logic and Structure: structural logic (coherent progression between components), contextual logic (consistency of instructions, terminology, and facts across turns).

Hallucination: hallucination awareness (guiding factual, evidence-based responses), balancing factuality with creativity.

Responsibility: bias, safety, privacy, reliability, societal norms.

The empirical findings reveal non-obvious correlations. Structural logic strongly correlates with contextual logic — well-organized prompts tend to be internally consistent. Hallucination awareness correlates with reliability awareness. And optimizing intrinsic or germane cognitive load naturally clarifies objectives — as you manage the model's cognitive burden, task specification emerges. This suggests that prompt quality is not a flat checklist but a structured space where improvements in one dimension cascade to others.

The practical recommendation: "optimizing prompts for directness, clarity, and conciseness may potentially improve token efficiency, logical coherence, and reduce extraneous cognitive load." This creates a concrete dimension for the custodial skill that How does LLM-mediated search change what expertise requires? identifies as missing — prompt literacy is not just knowing how LLMs work, but knowing how to communicate with them according to measurable principles.

The framework also reveals research gaps: communication properties are most studied for real-world chat, cognition properties for evaluation suites, instruction properties for NLU tasks — but many cross-dimension interactions remain unexplored. Politeness effects are surprisingly robust across generation tasks, potentially reflecting training biases toward benign queries.

Inquiring lines that use this note as a source 53

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
16 direct connections · 136 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

prompt quality has six evaluable dimensions grounded in Gricean maxims cognitive load theory and instructional design