INQUIRING LINE

Can prompting for specific creative paradigms improve ideation diversity?

This explores whether explicitly naming creative modes in a prompt (e.g. asking for combinational, exploratory, or transformational ideas) actually widens the range of ideas a model produces — and what else the corpus says has to be true for that to work.


This explores whether telling a model to reason in specific creative paradigms actually broadens its ideas, rather than just dressing up the same output. The most direct answer in the collection is that the paradigms themselves are real and currently unaddressed: research has mapped creative reasoning into three distinct modes — combinational (mixing existing ideas), exploratory (searching within a space), and transformational (breaking the space's rules) — and found that existing LLM reasoning methods only handle conventional problem-solving, which is offered as a likely cause of 'diversity collapse' in ideation Can LLMs reason creatively beyond conventional problem-solving?. So prompting for a paradigm isn't cosmetic; it targets a gap that default reasoning leaves open.

The corpus also suggests *how* the prompting works, and it's more structural than verbal. Several notes converge on the idea that the shape of the prompt — not just its instructions — changes the breadth of thinking. Structuring a single model's reasoning as a dialogue between distinct internal agents beats monologue specifically on diversity, because monologue locks into one fixed strategy Can dialogue format help models reason more diversely?. Branching, non-linear prompts that simulate multiple personas reproduce the cognitive synergy of a whole multi-agent system inside one model Can branching prompts replicate what multi-agent systems do?. And allocating compute to diverse *abstractions* rather than just sampling more solutions enforces breadth-first exploration where depth-only chains underthink Can abstractions guide exploration better than depth alone?. Read together, 'prompt for a creative paradigm' is one instance of a broader finding: giving the model an explicit structure for varied strategies is what keeps it from collapsing onto one.

But the collection adds two sharp caveats that decide whether paradigm-prompting helps or backfires. First, prompts are not universal — a 23-prompt benchmark across 12 models found that the same technique helps weak models and *hurts* strong ones, so paradigm framing that boosts a cheap model may degrade a frontier one Do prompt techniques work the same across all LLM tiers?. The same conditional logic shows up at the question level: step-by-step structure only helps when the task actually needs it, and can underperform direct answering otherwise Why do some questions perform better without step-by-step reasoning?. Diversity-inducing prompts are a tool matched to a task and a tier, not a free upgrade.

Second, and most pointed: diversity without competence is a liability. Multi-agent teams beat solo ideation only when the agents have genuine domain expertise — diverse teams without it underperform even one competent agent, because stimulation without grounding produces process losses instead of insight Does cognitive diversity alone improve multi-agent ideation quality?. This reframes the whole question: prompting for creative paradigms can raise novelty, but novelty and usefulness pull apart. That's exactly what the head-to-head study of research ideas found — LLM ideas were rated *more* novel than expert ideas yet slightly less feasible, because expert knowledge constrains novelty while models roam wider Do language models generate more novel research ideas than experts?.

The thing you might not have expected to learn: diversity isn't only a prompting trick, it can be trained in. Step-level critique inside the training loop counteracts the 'tail narrowing' that makes models converge prematurely, preserving solution diversity across self-training — a more fundamental fix than anything done at prompt time Do critique models improve diversity during training itself?. So the honest answer is: yes, prompting for specific creative paradigms can improve ideation diversity, but its payoff is conditional on model tier, task type, and underlying expertise — and the most durable gains in diversity may come from how the model was trained, not just how it's asked.


Sources 9 notes

Can LLMs reason creatively beyond conventional problem-solving?

Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.

Can dialogue format help models reason more diversely?

DialogueReason, which structures a single model's internal reasoning as dialogue between distinct agents in separate scenes, overcomes monologue reasoning's fixed-strategy and fragmented-attention weaknesses, especially on tasks requiring multiple problem-solving approaches.

Can branching prompts replicate what multi-agent systems do?

Research shows single LLMs using dynamic persona simulation achieve multi-agent cognitive synergy without multiple model instances. Solo Performance Prompting validates that structured prompting techniques map directly to multi-agent debate architectures, enabling equivalent outcomes through structural equivalence.

Can abstractions guide exploration better than depth alone?

RLAD jointly trains abstraction and solution generators, showing that allocating test-time compute to diverse abstractions outperforms parallel solution sampling at large budgets. Abstractions create structured breadth-first exploration that prevents the underthinking failure mode of depth-only reasoning chains.

Do prompt techniques work the same across all LLM tiers?

A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.

Why do some questions perform better without step-by-step reasoning?

Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

Do language models generate more novel research ideas than experts?

A statistically significant study of 100+ NLP researchers found LLM-generated ideas rated as more novel than human expert ideas (p<0.05), though slightly lower on feasibility. Expert knowledge constrains novelty, while LLMs explore wider conceptual combinations.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-evaluating whether prompting for specific creative paradigms actually improves ideation diversity—treating this as an open question, not a settled one.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–11/2025. Key constraints the library identified:
• Three distinct creative modes exist (combinational, exploratory, transformational), yet default LLM reasoning handles only conventional problem-solving, causing 'diversity collapse' (~2025).
• Dialogue-based and non-linear prompts (multi-persona framing) outperform monologue on diversity; branching structures functionally mimic multi-agent reasoning inside one model (~2025).
• Same prompting technique helps weak models but *hurts* frontier models; step-by-step reasoning only helps when task genuinely requires decomposition (~2024–2025).
• Diversity without domain expertise backfires: multi-agent ideation only beats solo when agents have grounded knowledge; diverse incompetence underperforms one expert (~2025).
• LLM-generated research ideas rank more novel than expert ideas but less feasible—novelty and usefulness decouple (~2024).
• Critique-based training (not just prompting) preserves solution diversity by counteracting premature convergence (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2409.04109 (2024-09) — LLM research ideas vs. expert ideas, novelty–feasibility trade-off.
• arXiv:2508.04575 (2025-08) — Multi-agent ideation quality depends on expertise, not just diversity.
• arXiv:2511.20471 (2025-11) — Universe of Thoughts: enabling creative reasoning.
• arXiv:2505.07049 (2025-05) — DialogueReason: dialogue-based reasoning in LLMs.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, o3, Claude 3.5+), training methods (RL critique fine-tuning, universe-of-thoughts pretraining), or orchestration tooling (multi-agent frameworks, memory/caching) have since relaxed or overturned it. Separate the durable question (does structure genuinely shape ideation breadth?) from the perishable limitation (current models collapse on diversity unless prompted). Cite what resolved it plainly.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially if recent papers claim paradigm-prompting no longer needed, or that training-time fixes obsolete prompt engineering.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Do reasoning models trained on critique naturally emit diverse paradigms without explicit prompting? (b) Does end-to-end training on multi-agent ideation tasks eliminate the model-tier conditional, or does it amplify it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines