Why do LLMs excel at feasible design but struggle with novelty?
When LLMs generate conceptual product designs, they produce more implementable and useful solutions than humans but fewer novel ones. This explores why domain constraints flip the novelty advantage seen in research ideation.
Expert evaluation of LLM-generated conceptual design solutions compared to crowdsourced ones reveals a profile that INVERTS the research ideation finding:
- Feasibility: LLMs higher (solutions are more technically implementable)
- Usefulness: LLMs higher (solutions are more relevant to the design prompt)
- Novelty: LLMs lower (solutions are less unique relative to the existing design space)
Few-shot learning further constrains: it makes LLM solutions more similar to crowdsourced examples (improving quality alignment) but reduces the diversity of solutions the LLM can generate.
This inverts Why do LLMs generate more novel research ideas than experts?, where LLM research ideas were rated MORE novel but LESS feasible than human expert ideas. The critical variable is domain structure:
- Unconstrained domains (research ideation): LLMs generate without the expert constraints that limit human novelty → MORE novel, LESS feasible
- Constrained domains (conceptual design): feasibility constraints and evaluation criteria push LLMs toward safe, implementable solutions → MORE feasible, LESS novel
The pattern suggests that Can LLMs generate more novel ideas than human experts? — in design, the evaluation criteria are embedded in the prompt (feasibility, usefulness ratings), channeling generation toward conservative solutions. In research, evaluation criteria are absent from the prompt, allowing unconstrained generation.
The few-shot finding connects to How much does demo position alone affect in-context learning accuracy? — examples constrain not just accuracy but creative scope. Each example narrows the generative space.
The Pron vs Prompt contest (2024) provides complementary evidence from creative writing specifically. In a direct contest between Patricio Pron (an award-winning novelist) and GPT-4, evaluated by literature critics and scholars using a Boden-inspired creativity rubric across 5,400 manual assessments, "LLMs are still far from challenging a top human creative writer." The authors conclude that "reaching such level of autonomous creative writing skills probably cannot be reached simply with larger language models." This extends the feasible-not-novel pattern beyond design: LLMs generate competent but uncreative output across both design and literary domains. Source: Arxiv/Prompts Prompting.
Inquiring lines that use this note as a source 15
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do LLMs generate ideas that sound novel but fail during execution?
- How do constrained versus unconstrained domains flip LLM novelty patterns?
- How can LLMs evaluate their own creative outputs for utility and novelty?
- Why do LLM-generated ideas score higher novelty yet lower feasibility than expert ideas?
- Why do LLMs plateau on creativity tasks while humans reach further?
- How does prompt design alter what kind of creativity LLMs can express?
- Why do LLM research ideas lack diversity despite high average novelty?
- What makes a novel research idea practically infeasible for implementation?
- Why do LLMs generate novel ideas but lack evaluative commitment?
- Do LLMs generate more novel ideas than they can evaluate?
- Why do models generate creative ideas but fail to evaluate their legitimacy?
- What makes novelty assessment harder to automate than idea generation?
- Can LLMs generate more novel research ideas than human experts?
- Do novelty and feasibility always trade off in idea generation?
- What unique perspective do designers bring to LLM adaptation that engineers might miss?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do LLMs generate more novel research ideas than experts?
LLM-generated research ideas are statistically more novel than those from 100+ expert researchers, but the mechanisms behind this advantage and its practical implications remain unclear. Understanding this paradox could reshape how we use AI in creative knowledge work.
inverted in constrained design domains
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
domain structure determines which side of the dissociation dominates
-
Why do LLMs generate novel ideas from narrow ranges?
LLM research agents produce individually novel ideas but cluster them in homogeneous sets. This explores why high average novelty coexists with poor diversity coverage and what it means for automated ideation.
diversity collapse occurs in both domains but through different mechanisms
-
How much does demo position alone affect in-context learning accuracy?
Moving demonstrations from prompt start to end without changing their content produces surprisingly large accuracy swings. Does spatial position in the prompt matter more than what demonstrations actually contain?
few-shot constrains creative scope
-
Why does AI writing sound generic despite being grammatically correct?
Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
the design domain inversion may be a grammar-rhetoric gap manifestation: in constrained design, LLMs produce structurally sound (grammatically competent) but evaluatively conservative (rhetorically inert) solutions; the same absence of evaluative stance-taking that makes academic writing generic makes design solutions feasible but unoriginal
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Conceptual Design Generation Using Large Language Models
- The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
- Agent Laboratory: Using LLM Agents as Research Assistants
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Original note title
LLMs generate more feasible and useful but less novel conceptual design solutions than humans — few-shot learning decreases diversity further