INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do modularity, routing, and se…›How should we design LLM systems t…›this inquiring line

Designers treat an LLM like clay to shape around a user — engineers treat it like plumbing to connect.

What unique perspective do designers bring to LLM adaptation that engineers might miss?

This explores what designers — people trained in user-centered, material-driven craft — notice about shaping LLM behavior that an engineering mindset focused on pipelines, models, and tooling tends to overlook.

This question reads as: when you put a designer rather than an engineer in front of an LLM, what do they *see* differently? The corpus points to a recurring answer — designers treat the model as an adaptable design material rather than a system to be wired up, and that reframing surfaces things the engineering stack misses.

The clearest evidence is the Canvil work, where designers shaped LLM behavior through system prompts and structured tinkering inside a Figma widget, no engineering expertise required Can designers shape LLM behavior without deep technical knowledge?. What they brought wasn't technical depth — it was user-centered judgment about how the model should behave in front of a real person. That judgment is exactly the layer engineers tend to skip, because the engineering instinct is to ask "is the action grounded, is the harness reliable, does the pipeline hold" Where does agent reliability actually come from?Can you turn an LLM into an agent by just fine-tuning?. Those are real problems, but they're about whether the system works, not about whether it works *for someone*.

The deeper thing designers carry is that human-centered objectives resist universal solutions — what counts as harm or benefit depends on whose perspective you take, and high-level guidelines can't operationalize that for you Can human-centered LLM design ever achieve universal solutions?. An engineer optimizing a metric is making implicit value choices; a designer's habit is to make those choices explicit and revisable, tied to a specific stakeholder. That's a different unit of analysis entirely: the person, not the benchmark.

Designers also tend to notice interaction failures that don't show up as errors in logs. LLMs default to *static* grounding — they retrieve and respond without the clarification loops humans use to build shared understanding, so intent can silently diverge Why do language models skip the calibration step?. An engineer sees a successful response; a designer sees a missing repair step. The same eye catches that adding more agentic tooling doesn't fix document-editing reliability, because the breakdown is upstream in judgment about *what* to change, not in the interface Can better tools fix LLM document editing errors? — a distinctly design-flavored diagnosis of an engineering-flavored fix.

The surprise worth leaving with: even framing is a design decision with consequences. Calling LLM errors "hallucinations" misdirects fixes toward perception or memory when the real mechanism is statistical fabrication — the wrong layer entirely Should we call LLM errors hallucinations or fabrications?. And there's a humbling counterweight: LLMs already produce *feasible* design solutions well, but lag humans on *novelty* Why do LLMs excel at feasible design but struggle with novelty?. So the designer's contribution to LLM adaptation isn't generating more options — the model does that — it's the perspective on whom it's for, where it quietly fails its user, and what we're really naming when we name its flaws.

Sources 8 notes

Can designers shape LLM behavior without deep technical knowledge?

Canvil demonstrates that designers can effectively shape LLM behavior via a low-barrier Figma widget for prompt authoring and testing, bringing user-centered judgment directly into model adaptation without requiring engineering expertise.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can you turn an LLM into an agent by just fine-tuning?

Converting LLMs to action-capable systems requires four distinct stages: curating action-environment-user datasets, training for action grounding, integrating agent infrastructure with memory and tools, and rigorous safety evaluation. The surrounding system and harness determine whether actions are grounded or hallucinated.

Can human-centered LLM design ever achieve universal solutions?

Research shows that optimal LLM design paths depend on stakeholder identity and how contested concepts like harm are operationalized. High-level guidelines fail to capture real-world nuance, leaving developers to make implicit value choices rather than explicit, revisable ones.

Why do language models skip the calibration step?

LLMs operate in static grounding mode—retrieving data and responding without clarification loops. Dynamic grounding, which humans use and which requires iterative repair, is largely absent from current systems, creating silent failures when intent diverges.

Show all 8 sources

Can better tools fix LLM document editing errors?

DELEGATE-52 shows that agentic tool access fails to improve performance on long-horizon document tasks. The degradation mechanism originates upstream in the model's judgment about what to change, not in editing interface limitations.

Should we call LLM errors hallucinations or fabrications?

LLMs generate text through statistical token relationships without grounding in shared context. Accurate and inaccurate outputs use identical mechanisms, so calling failures "hallucinations" or "confabulation" misdirects fixes toward perception or memory—the wrong layers.

Why do LLMs excel at feasible design but struggle with novelty?

Expert evaluation shows LLM-generated conceptual designs score higher on feasibility and usefulness but lower on novelty compared to crowdsourced human solutions. Few-shot learning further reduces diversity while improving quality alignment.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows2.42 match · arxiv ↗
Conceptual Design Generation Using Large Language Models1.71 match · arxiv ↗
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency1.71 match · arxiv ↗
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents1.69 match · arxiv ↗
LLMs Corrupt Your Documents When You Delegate1.69 match · arxiv ↗
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents1.68 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context1.61 match · arxiv ↗
Canvil: Designerly Adaptation for LLM-Powered User Experiences0.89 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst auditing whether designer-centered perspectives on LLM adaptation remain differentiated from engineering approaches, or whether newer methods have collapsed the distinction.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat these as perishable claims requiring re-test:
• Designers adapt LLMs via system prompts and structured interaction design *without* engineering expertise, treating the model as a malleable design material; engineers optimize for reliability and grounding (Canvil, 2024-01).
• Static grounding (retrieve-and-respond) misses the clarification loops humans use; designers notice intent drift; engineers optimize response validity (2025–2026 papers).
• Agentic tooling does not improve document-editing reliability; the failure is upstream in judgment about *what* to change, not tool availability (arXiv:2604.15597, 2026-04).
• LLMs generate feasible design solutions but lag on novelty; designers contribute perspective on *whom* it serves and where it silently fails users (arXiv:2306.01779, 2023-05).
• Framing errors as "hallucinations" misdirects fixes; the mechanism is statistical fabrication—a diagnostic mistake with consequences (2025–2026 literature).

Anchor papers (verify; mind their dates):
• arXiv:2401.09051 (Canvil, 2024-01): designerly adaptation via system prompts in Figma.
• arXiv:2407.08790 (Large Models of What?, 2024-07): engineering vs. linguistic agency frame.
• arXiv:2604.15597 (LLMs Corrupt Your Documents, 2026-04): document-editing failure diagnosis.
• arXiv:2605.06901 (Reflections and New Directions, 2026-05): human-centered LLM design directions.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, assess whether foundation model scale, instruction-tuning, preference alignment (RLHF variants), interpretability tooling, or new agent orchestration (memory systems, meta-learning per arXiv:2603.17187) have since relaxed the designer–engineer divide. Separate the durable question (do design and engineering perspectives remain orthogonal?) from the perishable claim (e.g., "system prompts are enough"). Cite what changed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—e.g., papers claiming engineers now routinely embed user-centered judgment, or designers now require deep systems knowledge, or the distinction is obsolete.
(3) Propose 2 research questions that ASSUME the regime has shifted: (a) what perspective arises when designers and engineers co-adapt LLMs, and what does each still contribute?; (b) is the designer's edge in LLMs really about values/intent, or does it dissolve under adversarial pressure or scaling?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Designers treat an LLM like clay to shape around a user — engineers treat it like plumbing to connect.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8