INQUIRING LINE

New Inquiring Lines

The most recently synthesized lines of inquiry, newest first — fresh questions minted as new research enters the library. For the full thematic map, see all inquiring lines by theme, or open the faceted explorer.

June 27, 2026 243

Why does test-time search also prioritize diversity over single-best convergence?Surfaces tensions
This explores why methods that let a model spend extra compute at inference — sampling many candidates, then searching and combining them — reward a model for producing varied competent answers rather than collapsing onto its single most-likely one.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How do cyclic learning rates anti-correlate with weight decay to create diversity?Finds patterns
This reads as a question about a specific training trick — alternating learning rate schedules pulling against weight decay to keep a model's outputs varied — but the corpus doesn't hold that exact mechanism, so the honest answer maps the adjacent territory it does cover: how cyclic training dynamics and diversity-preservation interact.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can learned priors effectively select and weight ensemble members by inference budget?Opens frontiers
This explores whether a model can learn — rather than hand-tune — how to pick which 'experts' or ensemble members to fire and how heavily to weight each one given a compute budget, and whether the corpus has anything on that learned routing under its various names.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
What paraphrase and conceptual matching tasks favor dense over exact-match retrieval?Finds patterns
This explores the division of labor between two retrieval styles — dense (embedding-based, matching by meaning) and exact-match (lexical, matching literal strings) — and asks which kinds of queries actually reward semantic matching over literal overlap.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can ensemble predictions be distilled back into a single deployable model?Finds patterns
This explores whether you can take a committee of models (or many sampled predictions) that beats any single model, and compress that gain back into one model you actually ship — and the corpus suggests the real question is *what* you'd be distilling, because the ensemble advantage often isn't where people assume.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
When should interpretable search programs replace ranked dense retrieval?Finds patterns
This explores when you should swap embedding-based retrieval (rank documents by vector similarity) for an agent that searches by issuing readable, executable commands like grep — and what the corpus says about which jobs each is actually good at.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How does scaffolding unstable mechanics improve reinforcement learning for search?Finds patterns
This reads the question as asking how external supports — diverse demonstrations, structured feedback, differential trajectory handling, memory — can stabilize the parts of reinforcement learning that break down when you train models to search.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can a trained decoder replace both search and parameter updates?Bridges fields
This reads the question as asking whether inference-time methods that act at the decoder — steering outputs, editing internal representations, composing skills on the fly — can stand in for both retrieval (search) and weight fine-tuning (parameter updates), and where that substitution breaks.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What tacit knowledge do researchers assume humans will fill in automatically?Bridges fields
This explores the implicit human capacities — verification, social judgment, and contextual sense-making — that AI systems quietly assume the reader will supply, rather than the systems themselves providing them.
Psychology, Society, and Alignment · Language, Text, and Discourse
When does training a memory model beat RAG or fine-tuning?Finds patterns
This explores the tradeoffs between three ways of getting new knowledge into a system — training a dedicated memory model, retrieval (RAG), and fine-tuning — and where the memory-model route actually wins.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How does query decomposition reduce retrieval costs at inference?Finds patterns
This explores how breaking a complex question into smaller sub-queries can cut the compute spent on retrieval at inference time — and what the corpus says about when that pays off.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why do dense embeddings semantically conflate distinct entities in retrieval?Finds patterns
This explores why dense vector embeddings — the standard retrieval workhorse — blur together distinct entities that should stay separate, and what the corpus says is causing it.
Model Architecture and Internals · Language, Text, and Discourse
Why do AI agents struggle with novel experiments but excel at routine tasks?Surfaces tensions
This explores why agents shine on tasks that match patterns they've already seen but stumble on genuinely novel work — and the corpus suggests the answer is less about raw intelligence than about where competence comes from: demonstrated routines vs. open-ended exploration.
Agentic Systems and Tool Use · Model Architecture and Internals
What capacity limits does the memory model face as corpus grows?Bridges fields
This explores 'memory' in the broad sense — how a model holds and recalls a growing body of facts or context — and asks where the ceiling is: is it the parameters, the context window, or something else entirely?
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How can agents verify research artifacts faster than they generate them?Finds patterns
This explores why AI generation currently outruns verification, and the architectural tricks — asynchronous checking, reusable formal verifiers, and process-level inspection — that could invert that asymmetry so checking an artifact costs less than producing it.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
Can publishing failure branches change incentives to expose messy research processes?Finds patterns
This explores whether making failed experiments and abandoned approaches into publishable artifacts—rather than editorial waste—could shift the incentives that currently push researchers to hide the messy parts of how work actually happened.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
How can agents distinguish over-generalized lessons from genuinely useful long-tail knowledge?Finds patterns
This explores how an agent learning from its own experience can tell the difference between a lesson it should generalize broadly and a rare, situation-specific fact worth keeping intact — the corpus mostly attacks this as a question of *how much to compress* a stored memory.
Agentic Systems and Tool Use · Model Architecture and Internals
Can reasoning improvements be attributed when optimizer and scaffold are unknown?Surfaces tensions
This explores whether you can credit a measured reasoning gain to the actual method when you don't know what's doing the work — the training optimizer (SFT, RL) versus the inference-time scaffold (prompts, decoding tricks, abstractions) — and the corpus says attribution is genuinely hard because the headline metric often hides where the gain came from.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
What discarding policy prevents both stale entries and loss of rare critical knowledge?Finds patterns
This explores the eviction problem in agent memory and context systems — how to decide what to throw away so you don't keep dead weight, yet never delete the rare item that turns out to matter — and the corpus suggests the answer is less a 'policy' than an architecture choice about who decides and how.
Model Architecture and Internals · Agentic Systems and Tool Use
How do staleness, drift, and contamination each degrade agent memory differently?Finds patterns
This explores how three distinct decay modes—old facts that no longer hold (staleness), quietly accumulating distortion (drift), and bad material polluting the store (contamination)—each break agent memory through different mechanisms, and what the corpus says about countering each.
Agentic Systems and Tool Use · Model Architecture and Internals
What role does verifier design play in reasoning capability gains?Surfaces tensions
This explores what the verifier — the thing that judges whether reasoning is correct — actually contributes to a model getting better at reasoning, and whether its design (or even its presence) is what drives the gains.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Does selective history retrieval outperform full context inclusion in agent reasoning?Surfaces tensions
This explores whether agents reason better when they pull in only the relevant slices of their past (memory, retrieved facts, prior steps) versus stuffing everything they've seen into the prompt — and the corpus comes down firmly on the side of selectivity.
Model Architecture and Internals · Agentic Systems and Tool Use
How can post-training research become reproducible without releasing full interfaces?Finds patterns
This explores what would actually have to be shared for someone to rebuild a post-training result — and whether you can get there without publishing the entire training apparatus.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What detection rate is needed to make evidence-injection attacks impractical at scale?Surfaces tensions
This explores whether there's a 'good enough' detection rate that defeats attacks where false evidence is dropped into a model's context (RAG documents, agent messages, web content) — and the corpus answer is that the attack economics, not a detection percentage, are the real lever.
Psychology, Society, and Alignment · Language, Text, and Discourse
Why does the same training data produce different gains across models?Surfaces tensions
This explores why feeding identical data to different models yields uneven improvements — what about a model's starting point, scale, or current ability changes what the same examples teach it.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How do language models treat injected information as shared common ground?Bridges fields
This explores whether—and how—LLMs actually absorb information you put in front of them (in a prompt or mid-conversation) into a jointly held 'we both know this now' ground, versus treating it as something less binding.
Psychology, Society, and Alignment · Language, Text, and Discourse
Why does transformer attention weight context more heavily than it verifies accuracy?Finds patterns
This explores why transformer attention is built to weight whatever is prominent in its context window — rather than to check whether that context is actually true — and what in the architecture makes that the default.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can a single fabricated claim shift model beliefs as much as multi-turn pressure?Surfaces tensions
This explores whether one planted falsehood — a fake citation, a fabricated authority — can move a model's stated beliefs as forcefully as a sustained back-and-forth where a user keeps pushing, and the corpus suggests the two attack the model through different doors.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
What workplace tasks still require human interaction despite AI agent improvements?Surfaces tensions
This reads the question as: where do AI agents still hit a wall in real work, and which of those walls are specifically about needing a human in the loop rather than more model horsepower.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
Why do persistent AI systems require fundamentally different design than ad-hoc supporters?Surfaces tensions
This explores why AI systems meant to persist and accumulate experience across many tasks need a different architecture than tools spun up to help with a single request — and what specifically changes when continuity becomes the design goal.
Agentic Systems and Tool Use · Model Architecture and Internals
How does soft thinking achieve stochastic exploration without explicit training?Surfaces tensions
This explores 'soft thinking' — reasoning in continuous concept space where the model carries probability-weighted blends of tokens forward instead of committing to one discrete word — and whether sampling in that continuous space can produce useful exploration without any added training, which the corpus mostly treats as a cautionary tale.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can workflow memory compound reusable skills into measurable success improvements?Bridges fields
This explores whether agents that store and reuse the routines they discover — 'workflow memory' — actually post measurable performance gains, and what makes that compounding work.
Agentic Systems and Tool Use · Model Architecture and Internals
Can non-variational posterior approximation schemes deliver comparable reasoning improvements?Finds patterns
This explores whether 'thinking by iterative refinement at inference time' — energy minimization, recursion, diffusion-style denoising — can match the reasoning gains of standard approaches, rather than only the variational/probabilistic methods usually framed this way.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does bounded committed state prevent multi-turn agent failures better than transcript replay?Surfaces tensions
This explores why agents lose the thread over long, multi-turn tasks — and why a small, rule-governed 'committed state' (what the agent has actually locked in) holds up better than replaying the whole conversation transcript.
Agentic Systems and Tool Use · Model Architecture and Internals
What distinguishes surface mechanisms from the training regimes that produce them?Finds patterns
This explores the gap between what a model *appears* to do at the surface — its output formats, behaviors, reasoning moves — and the training dynamics that actually installed those behaviors, and why the two are easy to confuse.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why do deliberately corrupted reasoning traces sometimes generalize better than correct ones?Surfaces tensions
This explores why training a model on reasoning traces full of wrong or irrelevant steps can match — and occasionally beat — training on correct ones, especially on problems unlike those it was trained on.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Should evaluations shift toward open-world messy tasks instead of contests?Surfaces tensions
This explores whether AI evaluation should move away from clean, single-score contests (leaderboards, one-shot benchmarks) and toward the kind of long, ambiguous, multi-step work real systems actually do — and what the corpus says is gained or lost in that shift.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
What capability dimensions does a single aggregate pass rate hide?Bridges fields
This explores what a single overall score (the percentage of tasks an AI gets right) flattens out — the separate, often conflicting capabilities that hide underneath one number.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can deterministic scoring capture the judgment work that deployment requires?Surfaces tensions
This explores whether fixed, rule-based scoring — single benchmark numbers, exact-match grading, temperature-zero determinism — can substitute for the messier judgment that real-world deployment demands, and where the corpus says that substitution breaks.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
Why do benchmarks become saturated so quickly after initial launch?Surfaces tensions
This explores why benchmark scores climb to ceiling fast — and the corpus suggests the cause is less about models getting smarter overnight than about what benchmarks accidentally reward: contamination, narrow task design, and optimization pressure on the exact thing being measured.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
Can we reverse the instruction-following deficit through targeted training?Surfaces tensions
This explores whether models that are bad at following instructions can be fixed through targeted training — and what the corpus reveals about why naive instruction tuning often doesn't deliver real instruction-following.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How should we allocate model budget between evolvers and harness users?Finds patterns
This explores how to split a fixed model budget between the work of evolving/updating a harness (writing the protocols, skills, memory edits) and the work of actually using that harness to do tasks — and whether those two jobs reward different model sizes.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How should forecasting methods adapt to a post-AGI regime?Finds patterns
This explores not how to predict AGI's arrival, but how the act of forecasting itself should change once AI systems become forecasters, actors, and economic agents — the corpus reframes the question from "predict the date" to "redesign the method."
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
What makes some frictions negligible while others block entire pathways?Surfaces tensions
This explores why some errors, costs, or interferences get harmlessly absorbed while others compound or sit at chokepoints that derail an entire process — and the corpus answers it less as a question about size than about position and propagation.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
What causes weak models to fail at activating harness artifacts?Surfaces tensions
This explores why smaller or weaker models can't reliably reach for and use the scaffolding — memory, skills, tools, protocols — that a harness provides, even when that scaffolding is sitting right there for them to use.
Training, RL, and Test-Time Scaling · Agentic Systems and Tool Use
Why does instruction-following capability decrease as models scale stronger?Bridges fields
This explores why models that get better at reasoning often get worse at doing exactly what you told them — and whether that's a genuine trade-off or a fixable training artifact.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Do multi-agent LLM systems scale better than centralized hierarchies?Bridges fields
This reads the question as a head-to-head — do agents that coordinate as peers handle growth better than a top-down command structure — and the corpus suggests the real answer is that neither pure form scales, while a hybrid that fixes structure but frees roles wins.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
Does human-in-the-loop AI collaboration accelerate recursive self-improvement safely?Surfaces tensions
This asks whether keeping humans in the loop actually makes AI's self-improvement loops both faster and safer — or whether those two goals trade off against each other.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can one streaming model handle turn-taking better than cascaded ASR-LLM-TTS?Opens frontiers
This explores whether a single end-to-end streaming model that jointly handles listening, thinking, and speaking can manage conversational turn-taking better than the traditional three-box pipeline of speech recognition, then language model, then speech synthesis.
Psychology, Society, and Alignment · Conversational AI and Personalization
What information does transcription destroy that direct speech pathways preserve?Surfaces tensions
This explores what gets lost when speech is converted to text first — the acoustic, articulatory, and prosodic information that direct speech-to-speech systems keep but a transcript throws away.
Model Architecture and Internals · Conversational AI and Personalization
Why does keeping full key-value blocks matter more than compressing them?Finds patterns
This explores why preserving the full key-value detail of context (rather than squeezing it into a smaller summary or fixed-size state) protects exactly the capabilities — copying, retrieval, fine distinctions — that compression quietly destroys.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How do speech encoders learn articulatory physics without phonetic labels?Bridges fields
This explores how self-supervised speech models pick up the bodily mechanics of how a vocal tract makes sound — without ever being told which sound is which phoneme.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does direct speech-to-speech generation really eliminate transcription latency?Finds patterns
This explores whether generating speech directly from speech input actually removes the delay that comes from transcribing audio to text first — and what the corpus says is really being saved (and what's being traded away).
Conversational AI and Personalization · Training, RL, and Test-Time Scaling
How should GPU execution paths and training objectives co-design sparsity?Finds patterns
This explores whether sparsity has to be designed into the training objective itself — not bolted onto a finished model as a hardware shortcut — for the GPU savings to come without a quality penalty.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can retrofitted sparse attention ever match jointly-trained sparse attention?Bridges fields
This explores whether sparse attention bolted onto an already-trained dense model can rival sparse attention that was learned from scratch during pretraining — and what the corpus says about why that gap exists.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why do speech benchmarks still measure transcription instead of comprehension?Finds patterns
This explores why speech evaluation keeps scoring how accurately a model writes down words (transcription) rather than whether it grasps meaning — and what that choice does to the models we build.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does hierarchical recurrence compare to selective layer looping for computational depth?Finds patterns
This explores two ways to get more 'thinking depth' out of a small network without adding parameters — stacking two recurrent timescales (hierarchical recurrence) versus re-running a chosen subset of layers in a loop (selective layer looping) — and what the corpus says about how they differ.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Does static per-token sparsity repeat the fixed-budget mistake at short sequences?Surfaces tensions
This explores whether applying a fixed sparsity pattern to every token — sparse attention that doesn't adapt to how long the input actually is — recreates the known error of fixed sparse-attention budgets, specifically in the short-sequence regime where there's less redundancy to throw away.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Does flexible inference-time compute scaling through looping improve efficiency further?Finds patterns
This explores whether looping computation — re-applying the same layers over and over at inference time instead of building bigger models — actually buys you efficiency, and whether making that looping *flexible* (more loops for hard problems, fewer for easy ones) pushes the gains further.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why are expensive rankers more resilient to adversarial content than cheap ones?Surfaces tensions
This explores whether spending more compute on a ranker (deeper cross-encoders, reasoning chains, LLM judges) actually buys resilience to adversarial content — and the corpus suggests the premise is shakier than it sounds.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can looped architectures achieve reasoning abilities that fixed-depth models cannot?Bridges fields
This explores whether models that reuse their own layers in a loop — recursing on a reasoning state instead of stacking more fixed layers — can solve problems that ordinary fixed-depth networks provably cannot.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Can provenance tracking prevent synthetic content from polluting the corpus?Surfaces tensions
This explores whether tracking where content came from — tagging it as human-written, AI-generated, or verified — can actually keep machine-made text from contaminating a knowledge base, and the corpus suggests provenance is necessary but does most of its work as a *gate at write-time*, not a label after the fact.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
Why does reapplying the same computation stages improve model performance?Finds patterns
This explores why looping the same layers or computation blocks back over the model's own working state (rather than adding more parameters) tends to make models better at hard reasoning — and where that gain comes from.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How do monoculture systems fail differently than diverse systems under attack?Surfaces tensions
This explores whether sameness is itself a vulnerability — how systems built from identical, agreeing parts collapse under attack or pressure in ways that systems with built-in diversity and disagreement don't.
Psychology, Society, and Alignment · Agentic Systems and Tool Use
Can agents learn to use scaffolding structure the way they learn token weights?Surfaces tensions
This explores whether the scaffolding around an agent — its memory, skills, and the wiring that connects its steps — can be learned and improved the way a model's weights are, rather than staying a fixed hand-built harness.
Model Architecture and Internals · Agentic Systems and Tool Use
Does tail distribution collapse in training predict retrieval failure patterns?Surfaces tensions
This explores whether the way rare, low-frequency items get squeezed out during training (the 'tail' collapsing) is the same force that explains where retrieval systems fail — and the corpus suggests these are two related-but-distinct failure stories that rhyme more than they overlap.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
What cognitive burdens should move from model parameters into harness infrastructure?Surfaces tensions
This explores which jobs we currently ask the model's weights to do — planning, memory, skill, calibration of effort — that research suggests belong instead in the scaffolding around the model (the 'harness'): the memory stores, tool protocols, and orchestration logic.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Why is long-context compute spent transforming context into internal state rather than storing it?Finds patterns
This explores why long-context models burn compute reshaping incoming text into the model's working representation (its internal state / weights / cache) instead of just parking the raw text in memory and reading it back later.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does structured environment-side state reduce multi-turn agent failure better than transcript replay?Bridges fields
This explores why agents stay reliable across long workflows when their working state lives in a bounded, schema-governed structure rather than being reconstructed by replaying the whole conversation transcript.
Agentic Systems and Tool Use · Model Architecture and Internals
How do continuous concept tokens explore multiple reasoning paths without explicit sampling?Finds patterns
This explores how 'Soft Thinking' lets a model keep many reasoning routes alive at once by reasoning in continuous concept space — instead of picking one discrete word per step, which forces a single path.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does instance novelty rather than chain length explain reasoning failure?Finds patterns
This explores why reasoning models break down — and the corpus's answer is that failure tracks how unfamiliar a specific problem instance is, not how many steps the reasoning chain requires.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can latent reasoning scale test-time compute without verbalized tokens or special training?Finds patterns
This explores whether models can do their 'thinking' inside hidden internal states — scaling up reasoning at inference time — without writing out chain-of-thought tokens and without a special training regime to teach them how.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can flow concentration in reasoning traces predict model quality better than tokens?Finds patterns
This explores whether *where* the work concentrates in a reasoning trace — a small set of pivotal tokens or steps — predicts answer quality better than the trace's raw length, and the corpus says the concentration matters and the length mostly doesn't.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Does reasoning efficiency transfer to tasks without ground truth dependency graphs?Bridges fields
This explores whether the techniques that make reasoning cheaper and shorter still hold up on open-ended tasks — the kind without a clean, verifiable chain of correct steps to lean on.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Which problems cannot be solved by parallel architectures and require serial depth?Finds patterns
This explores a real complexity-theory boundary: which kinds of problems can't be cracked by throwing more parallel compute (wider sampling, more votes, bigger Transformers) at them, and instead need genuine serial depth — step building on step.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why do harder puzzles cause all models to collapse despite larger token budgets?Surfaces tensions
This explores why throwing more tokens at a hard problem doesn't rescue models that 'collapse' on it — and the corpus's surprising answer is that the budget was never the binding constraint.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why does stronger reasoning reduce model compliance with instructions?Surfaces tensions
This explores why training a model to reason harder — longer chains of thought, more RL and SFT for problem-solving — tends to make it worse at obeying the explicit instructions you gave it.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Does reasoning training create blind spots in premise detection?Surfaces tensions
This explores whether training models to reason — chain-of-thought, RL on reasoning traces — makes them worse at noticing when a question's starting assumptions are wrong, charging ahead on a familiar template instead of stopping to question the premise.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How much of chain-of-thought reasoning is actually redundant?Finds patterns
This explores how much of a chain-of-thought (CoT) trace does real computational work versus just filling space — and what the corpus says about cutting the slack.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Does adding reasoning to models degrade other capabilities like rule inference?Surfaces tensions
This explores whether bolting chain-of-thought reasoning onto a model can actively make it worse at certain tasks — specifically inductive rule inference, where you learn a rule from examples including exceptions.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Do reasoning traces actually make better reward models for grading answers?Surfaces tensions
This explores whether adding chain-of-thought reasoning before a reward model scores an answer actually produces better judgments — or whether the reasoning is decorative.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Can outcome-focused objectives explain failures in reasoning evaluation?Surfaces tensions
This explores whether grading reasoning by its final answer — an outcome-focused objective — is itself the reason we keep misdiagnosing where and why reasoning models fail.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Why does answer-confirmation bias emerge in language model reasoning?Surfaces tensions
This explores why language models tend to lock onto an answer and then justify it — accommodating false premises, defaulting to safe choices, or hiding the real reasoning behind a confident-looking output — rather than where that bias literally lives in the network.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
Why do invalid reasoning prompts work as well as valid ones?Surfaces tensions
This explores why chains of reasoning that are logically broken or even nonsensical still produce correct answers — and what that tells us about whether the reasoning is doing the work, or just the look of reasoning is.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Does parameter composition work when adapter alignment is imperfect?Finds patterns
This explores whether you can merge fine-tuned weights or adapters (like LoRA) into one model when the pieces don't line up cleanly — and what the corpus says about why naive composition breaks and how to rescue it.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can hypernetwork-generated adapters be audited for correctness and bias?Finds patterns
This reads the question as: if a network generates lightweight model adapters on the fly, can we inspect those adapters to confirm they behave correctly and don't smuggle in bias — and the corpus answers obliquely, through adapters-as-state, backdoored checkpoints, and the machinery of verification.
Psychology, Society, and Alignment · Agentic Systems and Tool Use
How do aligned LoRA adapters compose through parameter-space arithmetic?Bridges fields
This explores whether you can take several LoRA adapters — each fine-tuned on an already-aligned base — and combine them by literally adding or merging their weight deltas, rather than retraining a single multi-skill model.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Do weight-space skills lose detail compared to textual skill descriptions?Finds patterns
This explores whether compiling agent skills into model weights (LoRA adapters, hidden-state interventions) throws away the richness of the same skill written out as plain-text instructions — and what each form is actually good at.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can text-space optimization and audit governance coexist in a single skill lifecycle?Opens frontiers
This explores whether a skill document can be both auto-optimized like model weights (text-space optimization) and kept under a human-auditable approval gate (governance) — and whether those two goals fight each other or reinforce each other within one workflow.
Training, RL, and Test-Time Scaling · Agentic Systems and Tool Use
What makes passive prompt transfer fail as a substitute for auditable expertise?Surfaces tensions
This explores why dropping expertise into a prompt and hoping it transfers can't stand in for knowledge that's versioned, inspectable, and correctable — and what the corpus says breaks when you try.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
How do capability tracks and behavior tracks stay separable during skill deployment?Surfaces tensions
This explores how a deployed skill keeps two things apart and independently inspectable — what an agent knows (capability) versus how it actually acts (behavior) — so each can be audited, corrected, or rolled back without contaminating the other.
Agentic Systems and Tool Use · Training, RL, and Test-Time Scaling
Does inspectable skill artifacts guarantee the behavior matches the person it claims to ground?Surfaces tensions
This explores whether being able to read a person-grounded skill file actually proves the resulting behavior faithfully reflects the person it's distilled from — or whether inspectability and fidelity are two different guarantees.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can trustworthy scoring prevent persistent iteration from compounding errors?Surfaces tensions
This explores whether a reliable scoring or verification signal is enough to keep iterative loops — self-improvement, refinement, learning-from-your-own-output — from quietly accumulating errors as they run.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Does effective feedback compute matter more than raw token expenditure for agent scaling?Finds patterns
This explores whether what scales an agent is the quality of useful feedback it actually absorbs — not the sheer count of tokens or tool calls it burns through.
Agentic Systems and Tool Use · Model Architecture and Internals
Why do most frontier models terminate early on long-horizon benchmarks?Surfaces tensions
This explores why frontier models give up or stop short on tasks that require sustained, multi-step effort over long horizons — and what the corpus says actually separates the models that keep going from the ones that quit.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How does accumulated context history degrade iteration quality in long-horizon tasks?Surfaces tensions
This explores why long, multi-step tasks get worse over time as the model drags its full history along — and what the corpus says about treating accumulated context as a liability rather than an asset.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
What architectural properties of deterministic models block multi-solution reasoning?Finds patterns
This explores why models that compute a single deterministic next-state — one fixed latent update per step — structurally can't hold several candidate solutions at once, and what the corpus says about the architectural fix.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can latent recurrence achieve the depth that standard transformers cannot?Bridges fields
This explores whether re-applying a model's layers over its own hidden state ('latent recurrence') can reach reasoning depths that fixed-depth transformers are mathematically barred from — and where that trick stops paying off.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How do soft thinking and token-level mixtures explore multiple paths simultaneously?Finds patterns
This explores how methods like Soft Thinking keep a model reasoning across several possible paths at once — instead of committing to one word at a time — and what that reveals about where reasoning actually lives.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Why does structured stochasticity help reasoning more than naive randomness?Bridges fields
This explores why randomness that's tied to a principled training objective or aimed at the right decision points helps reasoning, while undirected noise sprinkled into a model does nothing.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How should agent memory links evolve based on execution feedback?Finds patterns
This explores whether the connections inside an agent's memory (which items link to which) should be rewired on the fly using signals from how tasks actually turn out — and what the corpus says about doing that well.
Agentic Systems and Tool Use · Model Architecture and Internals
When does active reconstruction cost more than simple context dumping?Surfaces tensions
This explores the tradeoff between *rebuilding* what you need on the fly — traversing a memory graph, consolidating context into internal state, reasoning your way back to relevant facts — versus just *handing the model the raw text* and letting it read. The corpus suggests reconstruction wins on hard reasoning and loses on plain retrieval.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
What causes multi-turn agent failures: weak memory control or missing knowledge?Bridges fields
This explores whether agents break down over long multi-turn workflows because they lack the right knowledge, or because they can't govern what enters and stays in their working memory — and the corpus comes down firmly on the memory-control side.
Agentic Systems and Tool Use · Model Architecture and Internals
How does iterative depth apply to world models and physical simulation?Finds patterns
This explores whether the idea of 'iterative depth' — letting a model loop and refine its internal state over and over instead of just getting bigger — actually helps when the thing being modeled is a physical world that unfolds step by step.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why does explicit chain-of-thought work as a workaround for feedforward transformers?Bridges fields
This explores why writing reasoning out as tokens (chain-of-thought) compensates for something the transformer's fixed-depth, feedforward architecture can't do on its own.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why does memory effectiveness depend on connectivity rather than storage volume?Finds patterns
This explores why agent memory works better when its stored pieces are well-linked and reachable than when there's simply more of it sitting in storage.
Model Architecture and Internals · Agentic Systems and Tool Use
Can a tiny recursive network beat billion-parameter models on hard problems?Surfaces tensions
This explores whether recursion — re-running a small network on its own evolving reasoning state — can beat raw parameter count on hard reasoning puzzles, and what the corpus says actually drives that advantage.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
What makes fixed-point convergence better than learned halt tokens?Finds patterns
This explores why a model can better decide *when to stop thinking* by watching its own internal state settle into a stable point (fixed-point convergence) than by training it to emit a special 'I'm done' token (a learned halt token).
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How do you supervise reasoning that never becomes tokens?Surfaces tensions
This explores a real tension: almost every tool we have for grading reasoning—process rewards, reflection-token analysis, trace correctness—operates on visible text, so what happens when the reasoning lives in hidden state and never surfaces as words?
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How do parallel loops with position offsets differ from sequential loop architectures?Finds patterns
This explores the difference between widening a reasoning system by running many loops side-by-side (parallel trajectories) versus deepening it by stacking loops one after another (sequential recurrence) — and why the choice isn't free.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can looped models be designed to avoid oscillation in later iterations?Finds patterns
This explores whether looped models — which re-apply the same layers over and over to refine an answer — can be engineered so the later passes settle down instead of wobbling between states, and what the corpus says actually causes that wobble.
Psychology, Society, and Alignment · Training, RL, and Test-Time Scaling
Should loop count be fixed at training time or selected at test time?Surfaces tensions
This explores whether the number of times a looped/recurrent model re-runs its computation should be baked in during training or chosen dynamically per-input at inference — and what the corpus says about who decides when to stop.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can recurrent blocks learn genuinely novel computation beyond repetition?Finds patterns
This explores whether looping a network's layers — re-running the same block over and over — can do more than repeat the same step, actually building up new kinds of computation a fixed-depth network can't reach.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why does the second loop do most of the productive refinement work?Surfaces tensions
This explores why, in looped/iterative language models, the *second* pass through the computation does the heavy lifting of refinement — while the first sets up and later passes add little or even hurt.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How stable are the fixed points in recurrent transformer blocks?Finds patterns
This explores whether 'looped' or recurrent transformer blocks — ones that feed their own latent state back through the same layers repeatedly — actually settle into a stable resting state, and whether that settling is reliable enough to build on.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
What are the stages of inference inside language models?Finds patterns
This explores what actually happens inside a language model as it produces an answer — the internal phases of computation across layers, not the visible chain-of-thought it prints.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Is verbalized chain-of-thought necessary for language model reasoning?Finds patterns
This explores whether a model has to spell out its reasoning in words to reason well — or whether the visible 'thinking' is partly performance the corpus suggests can be compressed, hidden, or skipped.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why does reused computation outperform adding new model depth?Surfaces tensions
This explores why looping or re-applying the same layers (reused computation) often beats simply stacking more layers to make a model deeper — and what recursion buys that raw depth doesn't.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can recursion alone drive generalization better than model scale?Finds patterns
This explores whether re-applying a model's computation in a loop — recursion — can produce better generalization than simply making the model bigger, and what the corpus says about where each strategy's power actually comes from.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Why does natural language contain redundancy humans need but models don't?Surfaces tensions
This explores why language is full of repetition, grammatical scaffolding, and 'filler' that humans seem to rely on, yet models can strip away with little loss — and what that asymmetry reveals about how each side actually uses words.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does latent manipulation outperform token-level prediction for efficiency?Finds patterns
This explores whether models that 'think' in their own internal representation space — rather than predicting one token at a time — actually learn faster and run leaner, and where that advantage holds or breaks.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can we balance interpretability with the efficiency gains of compressed inter-model communication?Opens frontiers
This explores the tension between models communicating in compact latent representations (faster, denser than passing text back and forth) and our ability to read what they're actually saying — and whether the corpus offers ways to keep both.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
What governance risks emerge when agents communicate in unreadable text?Bridges fields
This explores what oversight breaks down when agents stop talking in plain language — sharing latent vectors, KV caches, or hidden states instead of readable text — and what new failure surfaces that opens up.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
Why does textual chain-of-thought avoid the representational drift problem automatically?Surfaces tensions
This explores why reasoning expressed in words doesn't suffer the 'drift' that plagues latent (vector-space) reasoning — and the corpus suggests the answer is that text is its own anchor, though that anchoring comes with a hidden cost.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How do model compression biases differ from human conceptual representation strategies?Finds patterns
This explores how the way LLMs squeeze concepts down (for efficiency) differs from how humans organize concepts (for usable, situated meaning) — and what the corpus reveals about the trade-offs each strategy makes.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why do intermediate predictors in looped models align with final outputs?Finds patterns
This explores why, in models that loop the same layers over and over, the predictions made at intermediate steps tend to agree with the final answer — and what that says about what looping actually computes.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How much explicit verbal signal must latent chains retain to perform well?Surfaces tensions
This explores how much human-readable verbal content a model's reasoning steps actually need — whether latent or compressed reasoning can drop most of the words and still think, or whether the explicit verbal signal is doing the real work.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can generative reconstruction preserve latent manifold structure better than geometric compression?Opens frontiers
This explores whether learning to *regenerate* data from a latent space (JEPA-style next-embedding prediction, generative judging) keeps the shape of the underlying data manifold more faithfully than squeezing it into a fixed-dimension geometric code (embeddings, vector compression) — and the corpus suggests the geometric route hits hard mathematical walls that generation sidesteps.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does process supervision recover reasoning accuracy better than outcome rewards in latent space?Finds patterns
This explores whether step-by-step (process) feedback restores reasoning accuracy more effectively than answer-only (outcome) rewards — and reads the 'latent space' angle as a question about where reasoning actually lives in the model.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
What linguistic blind spots do LLMs exhibit in discourse structure?Bridges fields
This reads 'discourse structure' broadly — not just sentence grammar but how meaning gets built across a conversation: topic, grounding, presupposition, and stance — and asks where LLMs systematically come up short.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
How do reward models and self-improvement mechanisms interact in training?Surfaces tensions
This explores the tension between models scoring their own work (self-improvement) and the external reward signals that training usually relies on — and the corpus's clear verdict is that the two need each other.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Why do epistemic failure modes cluster around world model limitations?Surfaces tensions
This explores why so many ways AI reasoning breaks down trace back to the same root: the model never built a real internal picture of how things work, only a pattern-map of how answers tend to look.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How does multi-agent reasoning scale compared to single-model approaches?Finds patterns
This explores whether adding more agents actually buys you more reasoning power than a single model — and the corpus answer is: less than you'd think, because most of the apparent gain is just spending more compute.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
What patterns emerge across test-time scaling and reasoning architectures?Finds patterns
This explores the recurring throughlines that connect two bodies of work — methods for spending more compute at inference time (test-time scaling) and the designs that make models reason — and what those bodies of work, read side by side, reveal about each other.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What trade-offs emerge between training objectives and model reliability?Surfaces tensions
This explores how the way you train a model — the objective you optimize for — quietly reshapes what it can be trusted to do, often degrading reliability in ways the training signal never penalized.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How do attention mechanisms fail at capturing graph structure?Bridges fields
This explores why standard transformer attention—a soft, pairwise weighting over tokens—struggles to represent the higher-order, relational structure that graphs encode, and what the corpus offers as workarounds.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why do internal representations differ when external performance matches?Finds patterns
This explores how two models can score the same on benchmarks while organizing knowledge completely differently inside — and why that internal divergence matters even when the scoreboard doesn't show it.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can gradient-based control reach properties that autoregressive methods cannot?Bridges fields
This explores whether generating text by tweaking a whole sequence at once with gradients (as diffusion models do) can hit targets that strict left-to-right, one-token-at-a-time generation structurally can't.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can architectural changes alone achieve compute-optimal per-prompt scaling?Finds patterns
This explores whether redesigning the model itself — its depth, attention ratios, internal structure — can deliver the win of spending the right amount of compute on each prompt, or whether that goal depends on things architecture can't touch.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why does convergence stability sometimes mislead about reasoning correctness?Surfaces tensions
This explores why a reasoning model's *confidence* — settling smoothly on an answer and not wavering — can be a false signal of correctness, and what better signals the corpus offers.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
How do pre-norm layers enable reliable fixed-point halting signals?Finds patterns
This explores the claim that fixed-point detection makes a reliable 'stop computing now' signal in looped/recurrent transformers — the corpus speaks directly to the fixed-point halting idea, though it doesn't isolate pre-norm normalization as the mechanism behind it.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How do diffusion language models outpace autoregressive generation in speed?Finds patterns
This explores why diffusion language models can generate text faster than autoregressive (AR) models that emit one token at a time — and the catch that the speedup isn't free.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
When does provable stability in latent dynamics fail to preserve fidelity?Finds patterns
This explores a recurring gap: a system can be provably stable or self-consistent in its internal (latent) behavior and still fail to be faithful — accurate, truthful, or true to reality.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
Do looped transformers naturally converge to fixed points during inference?Finds patterns
This explores whether looped transformers — models that re-run the same layers over and over instead of stacking more — actually settle into a stable 'fixed point' as they iterate, and whether that settling is useful.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
What are the five inseparable design choices when building world models?Finds patterns
This explores the claim that a 'world model' isn't one problem but five separate design decisions that have to fit together — and what goes wrong when you treat them as a single thing.
Model Architecture and Internals · Psychology, Society, and Alignment
How do spectral-norm constraints prevent divergence in world model rollouts?Bridges fields
This explores why long-horizon world-model rollouts tend to drift or blow up, and how a mathematical cap on how much each prediction step can amplify the state (spectral-norm constraints) keeps that error from compounding.
Model Architecture and Internals · Language, Text, and Discourse
What affordances do normalizing flows add over opaque vector reasoning?Finds patterns
This explores what you gain by making continuous latent reasoning probabilistically tractable with normalizing flows, versus letting a model 'think' in raw vector space where you can't sample, score, or train on those thoughts.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does iterative computation for reasoning transfer to environment dynamics modeling?Finds patterns
This explores whether the trick that makes reasoning work — re-running computation in a loop to deepen a model's thinking rather than adding parameters — also works when a model has to predict how an environment will change, i.e. world modeling.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can latent reasoning scale test-time compute without verbal tokens?Finds patterns
This explores whether models can stretch their 'thinking budget' at inference time by iterating on hidden internal states — reasoning in latent space — instead of generating visible word-by-word chains of thought.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How do parameter scaling and latent vectors interact in language models?Finds patterns
This explores whether you can make a language model more capable by scaling something other than its parameter count — specifically latent vectors (compact internal representations the model learns to reason over) — and how those two levers play off each other.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Does reasoning require verbalization to be trainable and controllable?Surfaces tensions
This explores whether reasoning has to be spelled out in words (chain-of-thought) for us to train it and steer it — or whether reasoning can live in a model's hidden states and still be shaped and controlled.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can benchmark scores on verifiable tasks transfer to unseen problems outside the training domain?Bridges fields
This explores whether high scores on checkable tasks (math, code, puzzles with right answers) actually predict performance on problems the model never trained on — or whether they're inflated by memorization and break down off-distribution.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What separates verifiable reasoning from open-ended judgment in scaling requirements?Surfaces tensions
This explores why reasoning with checkable answers (math, code) scales cheaply and reliably, while open-ended reasoning — judgment calls with no clean right answer — resists the same scaling tricks, and what the corpus says actually causes the gap.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Does curriculum-based training keep small models perpetually at their learning edge?Finds patterns
This explores whether curriculum training — feeding small models problems matched to their current ability and advancing as they improve — actually keeps them learning, or whether it stalls; the corpus suggests the 'learning edge' is a real, measurable frontier, and staying on it is the whole game.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
When should multi-agent systems escalate rather than aggregate toward a single decision?Bridges fields
This reads 'escalate vs. aggregate' as a design choice: when should a multi-agent system hand a decision upward — to a human, to a held-open set of competing answers, or to a different process — instead of forcing its members to vote, average, or merge into one verdict?
Agentic Systems and Tool Use · Model Architecture and Internals
Why does consensus-seeking destroy information in normative but not factual tasks?Surfaces tensions
This explores why averaging toward agreement is harmless when there's a right answer (factual tasks) but lossy when the spread of positions is itself the signal (normative tasks) — and what the corpus says about consensus that erases rather than resolves.
Psychology, Society, and Alignment · Language, Text, and Discourse
Why do reasoning gains from RL require models trained with headroom and edge-of-competence data?Surfaces tensions
This explores why reinforcement learning only delivers reasoning gains when the base model still has untapped capacity ("headroom") and is trained on problems sitting right at the boundary of what it can already do — and the corpus has a clear mechanistic answer.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
How does imitation pretraining followed by RL exploration compare to either method alone?Surfaces tensions
This explores whether warming up a model by imitating good examples first, then letting it explore via reinforcement learning, beats doing either step on its own — and why the order matters.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can reasoning traces reliably distinguish genuine value conflicts from reasoning errors?Bridges fields
This explores whether you can read a model's reasoning trace and tell the difference between two outputs disagreeing because they hit a real values trade-off versus disagreeing because one of them simply reasoned wrong.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What causes multi-turn dialogue quality to degrade over time?Bridges fields
This explores why AI conversations get worse the longer they run — and the corpus points to one dominant cause (the model misreading what you want early) plus a few quieter ones (context crowding, persona drift).
Conversational AI and Personalization · Psychology, Society, and Alignment
Can end-to-end models maintain debuggability without modular components?Surfaces tensions
This explores whether a single end-to-end model can stay inspectable and fixable when something goes wrong — or whether you need separate, observable parts (planners, verifiers, tool calls) to know where a failure happened.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
Why do cascaded conversation systems accumulate errors at module boundaries?Surfaces tensions
This explores why pipeline-style dialogue systems — where one module's output (speech recognition, intent parsing, dialogue management, generation) feeds the next — let small errors compound into large failures at the handoffs between stages.
Psychology, Society, and Alignment · Conversational AI and Personalization
How much latency improvement comes from collapsing the speech pipeline?Finds patterns
This explores what you actually gain in speed when you stop chaining together separate speech-to-text, language, and text-to-speech stages and instead let one model handle voice end to end.
Conversational AI and Personalization · Training, RL, and Test-Time Scaling
How do interpretive and evaluative disagreement show up differently in agent traces?Finds patterns
This explores the difference between two kinds of disagreement an agent's reasoning leaves behind — disputes about what the input *means* (interpretive) versus disputes about what the right *call* is once the facts are settled (evaluative) — and how each leaves a different fingerprint in the trace.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
Why does attention excel at context retrieval but struggle with state updates?Surfaces tensions
This explores a tradeoff baked into attention: it's superb at reaching back and pulling a fact out of context, but doesn't easily fold that context into a compact, updatable internal state — and the corpus suggests the two abilities pull in opposite directions.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How do transformers compare to state-space models on copying and retrieval?Finds patterns
This explores the architectural showdown between transformers and state-space models (SSMs like Mamba) specifically on copying long strings and pulling facts back out of context — and why the gap exists.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
What tasks does recurrent depth solve that feedforward models cannot?Finds patterns
This explores what specific capabilities you get from re-applying the same layers in a loop (recurrent depth) — things that simply making a wider or even deeper fixed-stack feedforward network can't deliver.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Is the structure of reasoning traces learned as a shared stylistic convention?Bridges fields
This explores whether the *shape* of a reasoning trace — its planning, backtracking, step-by-step layout — is a learned formatting habit the model picks up from training, rather than a record of actual computation.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What makes discourse structure different from mechanistic causal structure in traces?Finds patterns
This explores the gap between the story a reasoning trace tells in language (discourse structure) and the actual computation that produced the answer (mechanistic causal structure) — and why the two don't line up.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can process reward models work on branching reasoning traces with backtracking?Surfaces tensions
This explores whether process reward models (the systems that score a reasoning chain step-by-step) can cope with messy thinking traces that branch, backtrack, and abandon dead ends — rather than clean, linear final answers.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Can recurrent transformers track state more efficiently than feedforward models?Finds patterns
This explores whether looping a transformer's computation back on itself (recurrence) lets it track evolving state more cheaply than a standard feedforward stack — and what the corpus says about why that helps.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How much reasoning work happens in steps that don't affect the final answer?Finds patterns
This explores how much of an LLM's chain-of-thought is functional — actually driving the answer — versus decorative steps the final answer doesn't depend on.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why do language models produce reasoning traces that mimic human reasoning style?Surfaces tensions
This explores why model reasoning traces *look* like human step-by-step thinking — and what the corpus says is actually happening underneath that human-style narrative.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
How does parametric knowledge sabotage context-grounded question answering?Bridges fields
This explores how a model's baked-in training knowledge (its 'parametric' memory) can override the information you actually hand it in the prompt — so the model answers from what it already 'knows' instead of from the context in front of it.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
Can abstention behavior transfer from small models to frontier models?Bridges fields
This explores whether knowing-when-to-say-'I-don't-know' (abstention) is a capability that scales up from small models to frontier ones — and the corpus reframes it as a trainable behavior that's undertrained at every scale, not a property you inherit by getting bigger.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
What makes a model refuse to answer without evidence present?Surfaces tensions
This explores what actually drives a model to abstain — to say 'I can't answer that' when it lacks grounding evidence — and why that behavior is so fragile in practice.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
Why does delegation training help models that work alone?Finds patterns
This explores why training a model to hand off subtasks to other agents makes it better even when it later works solo — what the delegation skill actually teaches.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How much does citation grounding help if agents ignore the citations?Surfaces tensions
This explores whether citations actually do their job—grounding answers in evidence—or whether they decouple from the reasoning and become decorative trust signals that humans and models alike respond to without checking.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
Why do users trust citations even when they are irrelevant?Surfaces tensions
This explores why citations function as a trust signal that's largely detached from whether the cited sources actually support the claim — and what that decoupling reveals about how people read AI answers.
Language, Text, and Discourse · Reasoning, Retrieval, and Evaluation
When and what should a model actually decide to delegate?Surfaces tensions
This explores two separate questions hiding inside delegation — the *when* (at which moments handing off is worth the overhead) and the *what* (which subtasks are actually good candidates) — and what the corpus says about whether models are any good at making those calls.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How should future memory systems control what gets written and trusted?Surfaces tensions
This explores the twin control problems for agent memory — the *write gate* (what's allowed to become a permanent memory) and the *trust gate* (how stored memories earn the right to be believed later) — and what the corpus suggests about designing both.
Agentic Systems and Tool Use · Model Architecture and Internals
Can delegation prevent silent corruption in long delegated workflows?Surfaces tensions
This explores whether the act of delegating work to LLM agents can itself stop the slow, invisible accumulation of errors that creeps into long multi-step workflows — and the corpus suggests delegation is usually the cause of that corruption, not the cure.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
What determines whether accumulated state generalizes spuriously across continual learning domains?Finds patterns
This explores when knowledge a model carries forward from one continual-learning task bleeds into another as false generalization — and what structural factors decide whether that carryover helps or corrupts.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can in-context learning's advantage erode once interaction histories exceed the context window?Finds patterns
This explores whether in-context learning — adapting from examples in the prompt without weight updates — loses its edge once a running interaction grows longer than what the context window can hold, and what the corpus offers as the fallback.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why does consolidating more state sometimes hurt performance below the no-memory baseline?Surfaces tensions
This explores why an agent that compresses and merges its accumulated memory can end up worse than one with no long-term memory at all — the failure isn't too little memory, it's the act of consolidation itself.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
What makes code inspectable feedback more reliable than natural language verification?Surfaces tensions
This explores why feedback grounded in something checkable — code that runs, a formal verifier, a structured proof obligation — tends to catch errors that one model simply reading another model's prose answer in natural language will miss.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
How does program-aided reasoning externalize computation into executable form?Bridges fields
This explores 'program-aided' or code-based reasoning — the idea that instead of reasoning in prose, an LLM offloads the actual computation into code or tool calls that something else runs, and why that shift matters.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
When does forcing agent reasoning into code become a leaky abstraction?Finds patterns
This explores the limits of the 'code-as-reasoning-substrate' idea — when expressing an agent's thinking as executable code stops helping and starts hiding the work the model actually has to do.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
What structural changes help AI generation keep pace with verification?Surfaces tensions
This explores the design moves — not bigger models — that close the gap where AI produces plausible work faster than anything can check it, so verification stops being the bottleneck.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
How do execution traces and tests represent agent environment state?Finds patterns
This explores how an agent's environment state shows up in two concrete records — the step-by-step execution trace of what it did, and the tests or checks run against it — and what those records actually capture.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
Does provenance alone guarantee that cited sources are actually sound?Finds patterns
This explores the gap between provenance (knowing where a claim came from) and soundness (whether the source is actually any good) — the corpus suggests these are two different questions that get quietly collapsed into one.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
How do specialized agent roles improve consistency in long-form writing?Finds patterns
This explores how splitting writing work across specialized agent roles—rather than asking one model to draft everything—holds a long document together, and why that division of labor fights the consistency problems that sink single-model long-form generation.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
Why do readers trust citations more even when they are irrelevant?Surfaces tensions
This explores why citation count works as a trust signal that's decoupled from whether the citations actually support the claim — and what the corpus says about the deeper mechanism: trust attaches to the *signs* of authority rather than the substance.
Language, Text, and Discourse · Reasoning, Retrieval, and Evaluation
What happens to long-tail reasoning when AI assists public deliberation?Bridges fields
This explores what happens to the unusual, hard-to-verify, minority lines of reasoning — the 'long tail' — when AI is brought in to help groups think and decide together, and the corpus suggests the tail gets squeezed from several directions at once.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
What classifier accuracy is needed to assign memory roles reliably at retrieval time?Surfaces tensions
This reads the question as: if you build a system that tags retrieved memories by their *function* (clarifying, irrelevant, etc.) before using them, how good does that tagging classifier have to be before it helps rather than hurts — and the corpus reframes the question more than it answers it numerically.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Does diversity prompting actually help models explore human argument space?Surfaces tensions
This explores whether prompting tricks meant to make a model produce a wider range of viewpoints actually let it cover the real spread of human arguments — or whether they just rearrange what the model already leans toward.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
Should abstract preference knowledge replace specific interaction recall in personalization?Finds patterns
This explores whether personalization should store distilled, abstract preference summaries instead of replaying a user's specific past interactions — and the corpus has a sharper answer than you'd expect.
Conversational AI and Personalization · Recommender Systems
How does indiscriminate memory injection cause multi-turn agent failures?Surfaces tensions
This explores why dumping everything an agent has seen — full transcripts, unfiltered retrieval, auto-consolidated history — back into its context causes long workflows to break down, and what controlled alternatives the corpus offers.
Agentic Systems and Tool Use · Model Architecture and Internals
How do different LLMs converge on similar argumentative structures independently?Finds patterns
This explores whether LLMs land on the same argumentative shapes because they reason their way there independently, or because they're all compressing the same underlying structure of language — and the corpus leans hard toward the second.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
What makes a sub-goal verifiable enough to provide dense feedback signals?Finds patterns
This explores what property a sub-goal must have to generate frequent, fine-grained training signal rather than a single pass/fail at the end — and the corpus suggests verifiability is less about a goal being 'objectively checkable' and more about whether it can be decomposed into many small criteria or instrumented for an intrinsic progress signal.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Why does argument diversity matter more than individual argument quality?Surfaces tensions
This explores why a *spread* of viewpoints beats any single well-built argument — and what the corpus says happens when AI floods us with claims that all come from roughly one perspective.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
Why do conversational systems struggle more than static retrieval with ambiguous queries?Bridges fields
This explores why a back-and-forth chat system has a harder time with vague queries than a fixed search index does — and what the corpus says the missing ingredient is.
Conversational AI and Personalization · Reasoning, Retrieval, and Evaluation
Why does population-based search outperform both parallel and sequential test-time scaling?Finds patterns
This explores why evolutionary/population-based methods (keeping a diverse pool of candidates that mutate and recombine) beat the two simpler test-time strategies — sampling many answers at once (parallel) and refining one answer step by step (sequential).
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How does recombining partial trajectories maintain coherence in natural language reasoning?Finds patterns
This explores whether you can stitch together fragments of separate reasoning chains and still get something coherent — and what the corpus reveals about why natural-language reasoning is more (or less) recombinable than it looks.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
What breaks when multiple agents share and revise the same artifacts?Surfaces tensions
This explores the failure modes that emerge specifically from shared, mutable artifacts in multi-agent systems — what goes wrong when several agents read, write, and revise the same files, documents, or state rather than working in isolation.
Agentic Systems and Tool Use · Model Architecture and Internals
How do you verify agent code under incomplete feedback signals?Surfaces tensions
This explores how you check whether agent-written or agent-run code is correct when you can't fully execute it or when the usual success signals (final-answer pass/fail) are missing or unreliable.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
How do agents decide which created code deserves long-term persistence?Opens frontiers
This explores how agents decide which code they write during a task is just scratch work versus what deserves to be saved, shared, and promoted into durable infrastructure — and the corpus suggests this 'lifecycle' decision is one of the least-settled problems in agent design.
Agentic Systems and Tool Use · Model Architecture and Internals
When does backward decomposition fail on open-ended or unstructured tasks?Finds patterns
This explores the limits of backward decomposition — breaking a goal into sub-steps from the target back to the start — and asks where that strategy stops working: open-ended tasks with no clean target state, or unstructured ones where the steps can't be cleanly separated.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can agent-authored skill libraries compound autonomy gains over time?Opens frontiers
This explores whether agents that write their own reusable skills can keep getting more capable through accumulation — and what the corpus says limits or accelerates that compounding.
Agentic Systems and Tool Use · Model Architecture and Internals
Why do harness validators shape what models learn to emit?Surfaces tensions
This explores how the checking machinery around a model during training — reward functions, verifiers, trajectory filters — ends up authoring the model's habits, because a model learns to satisfy whatever scores it, not the thing you hoped it would learn.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can single-axis benchmarks measure across all three agent capability layers?Surfaces tensions
This explores whether one benchmark number can capture agent ability when 'capability' is actually layered across distinct dimensions — and the corpus says no, that's the core failure of how agents get measured today.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
What makes agent-initiated artifacts the underexplored frontier in harness engineering?Bridges fields
This explores why the third layer of agent code — the artifacts agents write for themselves during execution — gets the least attention, even though it's where new capability could compound.
Agentic Systems and Tool Use · Model Architecture and Internals
How do agent-created code artifacts become part of harness infrastructure?Finds patterns
This explores the path by which code an agent writes for itself during a task gets promoted into the durable scaffolding (the harness) that future runs and other agents rely on.
Agentic Systems and Tool Use · Model Architecture and Internals
Does base model strength determine adapter usefulness across users?Finds patterns
This explores whether a stronger shared base model automatically makes lightweight personalization adapters (PEFT/LoRA-style deltas) more useful for a diverse population of users — or whether usefulness hinges on other factors.
Agentic Systems and Tool Use · Training, RL, and Test-Time Scaling
What makes prompts and retrieval insufficient for real personalization?Bridges fields
This explores why the two most common personalization shortcuts — stuffing context into prompts and retrieving a user's past interactions — fall short of capturing who a user actually is, and what the corpus suggests works better.
Conversational AI and Personalization · Recommender Systems
Can per-user adapters remain consistent without drifting or leaking?Finds patterns
This explores whether you can give every user their own lightweight fine-tuned adapter on a shared base model and keep each one stable over time (no drift) without their learned traits bleeding into other tasks or users (no leaking).
Training, RL, and Test-Time Scaling · Agentic Systems and Tool Use
What makes recursive depth more effective than parametric depth for puzzles?Finds patterns
This explores why looping a small network back over its own intermediate state ('recursive depth') beats simply stacking more layers or piling on parameters ('parametric depth') on structured puzzles like ARC-AGI.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How does latent state recursion differ mechanistically from chain-of-thought prompting?Finds patterns
This explores the mechanical difference between two ways a model can 'think': recursing on an internal latent state (reasoning that never becomes words) versus chain-of-thought, where the model writes out intermediate steps as text and reads them back.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can removing hierarchy from dual-recurrence models improve reasoning performance?Opens frontiers
This explores whether the two-timescale 'hierarchy' in models like the Hierarchical Reasoning Model is actually doing the work — or whether the recursion alone is what improves reasoning, so hierarchy could be dropped without loss.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why does naive personalization fine-tuning destroy generalist reasoning?Surfaces tensions
This explores why fine-tuning a model on a single user's data to personalize it tends to wreck its broader reasoning ability — and what the corpus says is actually breaking.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How do orthogonal adapter vectors avoid interference at scale?Surfaces tensions
This explores how you can stack many task- or user-specific adapters on one base model without their learned changes colliding — and the corpus reframes that less as a geometry trick ('make the vectors orthogonal') and more as a question of isolating which parameters each adapter is allowed to touch.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
What prevents scholarly infrastructure from filtering out ghost-authored records automatically?Surfaces tensions
This reads 'ghost-authored records' as AI-fabricated or machine-generated academic papers, and asks why the publishing/indexing pipeline can't just auto-reject them — what makes the fakes pass the filters.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
Why does recursion on latent states improve generalization more than scale?Surfaces tensions
This explores why letting a small network loop over its own internal reasoning state — re-running the same layers — produces better generalization than simply adding more parameters, and what the corpus says about depth/recursion as a scaling axis distinct from size.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How much of the modern web is actually AI-generated without disclosure?Bridges fields
This explores not just the raw share of the web that's AI-generated, but the deeper problem the corpus keeps circling: that disclosure barely matters because we've lost the ability to tell the difference at all.
Language, Text, and Discourse · Psychology, Society, and Alignment
How can agents evolve their own skills without human input?Finds patterns
This explores how agents can improve their own capabilities — generating their own training signal, feedback, and curriculum — when no human is in the loop to label, demonstrate, or reward.
Model Architecture and Internals · Agentic Systems and Tool Use
Why do different LLMs converge on similar outputs in open-ended tasks?Surfaces tensions
This explores why models with different sizes, architectures, and training regimes tend to produce similar answers on open-ended tasks — and the corpus suggests the convergence is built into the shared autoregressive objective, not a coincidence of training data.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can token-level watermarks detect synthetic content better than stylometry alone?Finds patterns
This explores whether the best way to catch AI-generated text is a signal baked in at generation time (token watermarks) versus reading the writing style after the fact (stylometry) — and the corpus actually pushes back on both framings, pointing toward a third signal: structure.
Psychology, Society, and Alignment · Language, Text, and Discourse
Why do agents ignore condensed experience in favor of raw data?Surfaces tensions
This explores a counterintuitive finding: agents lean on raw interaction logs and largely disregard the tidy summaries built from them — and asks why that happens and what it implies for how we build agent memory.
Model Architecture and Internals · Agentic Systems and Tool Use
What makes skills worth externalizing into a persistent harness?Bridges fields
This explores what distinguishes a skill worth saving to a persistent, external scaffold (memory, skill libraries, reusable code) from one that should stay inside the model — and why externalizing pays off.
Agentic Systems and Tool Use · Model Architecture and Internals
Does harness benefit depend on which model tier you use?Surfaces tensions
This explores whether the value you get from a harness — the memory, skills, and protocols wrapped around a model — changes depending on whether you're running a weak, mid-tier, or frontier model.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Can agent skills move from prompts to trainable parameters?Opens frontiers
This explores whether the procedural know-how an agent uses — its 'skills' — has to live in the prompt as text, or whether it can be baked into the model's weights (or stored elsewhere entirely), and what each choice costs.
Agentic Systems and Tool Use · Model Architecture and Internals
Why has agent research prioritized policy over world model development?Finds patterns
This explores why the field has poured effort into teaching agents what action to take next (the policy) while largely neglecting the agent's internal model of how its environment will respond (the world model) — and what that imbalance costs.
Agentic Systems and Tool Use · Model Architecture and Internals
How do world models decompose between representation of facts versus generative mechanisms?Finds patterns
This explores whether a world model is one thing or two — a store of facts about the world versus a generative engine that can run forward, simulate interventions, and answer 'what if' — and what the corpus says about how (and whether) those two parts come apart.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can simulation fidelity limit what agents learn from trained world models?Surfaces tensions
This explores whether the realism of a learned simulation (a 'world model' an agent trains inside) caps what the agent can actually learn — i.e., does the agent only ever learn what the simulator was good enough to show it?
Model Architecture and Internals · Agentic Systems and Tool Use
Does next-state prediction alone build mechanistic world models or just sophisticated interpolation?Surfaces tensions
This explores whether training a model to predict the next state of the world actually teaches it how the world works — a genuine causal model you can reason with — or whether it just gets very good at pattern-matching observed sequences without understanding the underlying structure.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Do trajectory quality metrics predict agent safety and user trust?Surfaces tensions
This explores whether scoring how an agent gets to an answer — its trajectory, not just its final output — actually tells you anything about whether the agent is safe to deploy and worthy of a user's trust.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
What happens when different harnesses project the same model?Surfaces tensions
This explores what changes when you keep the model fixed but wrap it in different scaffolding — the harness of prompts, tool loops, memory, verifiers, and access level through which a model's behavior is actually expressed.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How do memory hygiene and context efficiency trade off in deployed agents?Surfaces tensions
This explores whether keeping an agent's memory clean and reliable (no error buildup, no stale context) is actually at odds with keeping its token usage lean — or whether the two goals can be served by the same design.
Model Architecture and Internals · Agentic Systems and Tool Use
How can decentralized discovery improve agent protocol design and adoption?Bridges fields
This explores whether letting agents find each other and find capabilities dynamically — rather than wiring connections by hand or routing through a central registry — makes agent protocols easier to design and more likely to get adopted.
Agentic Systems and Tool Use · Model Architecture and Internals
What would unified agent-to-agent and agent-to-tool protocols actually look like?Bridges fields
This explores what a single, shared standard for agents talking to each other and to their tools would have to look like in practice — and the corpus's answer is less 'one grand protocol' and more 'a thin bridging layer over messy reality.'
Agentic Systems and Tool Use · Model Architecture and Internals
Can versioned capability vectors solve the discovery gap in existing protocols?Surfaces tensions
This explores whether embedding an agent's abilities as searchable, version-stamped vectors can fix the hardest part of agent protocols like MCP — letting one agent find the right collaborator or tool without someone hand-wiring the connections in advance.
Model Architecture and Internals · Agentic Systems and Tool Use
Can verification cost be measured separately from task completion speed?Opens frontiers
This explores whether the work of checking that an output is correct can be tracked as its own quantity — separate from how fast the system produces that output — and what the corpus says about why you'd want to.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Why does MCP's portability come with determinism failures in production workflows?Surfaces tensions
This explores why the same trait that makes MCP easy to plug into many systems — its rigid, portable schema — is also what produces flaky, non-repeatable behavior once agents run for real.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use

June 4, 2026 77

Why does preference measurement validity matter before any aggregation?Surfaces tensions
This explores a sequencing argument in preference-based AI training: averaging or pooling preference data can't rescue measurements that were flawed to begin with, so the question is what 'valid' even means before you start combining signals.
Psychology, Society, and Alignment · Recommender Systems
What does egalitarian social choice theory contribute to AI alignment?Bridges fields
This reads the question as: what does the formal theory of fairly aggregating individual preferences into a collective choice — voting rules, equal weighting, welfare aggregation — actually buy us when we try to align AI with human values, and where the corpus says it breaks down.
Psychology, Society, and Alignment · Language, Text, and Discourse
Can latent-variable reward models capture multimodal preference distributions?Opens frontiers
This explores whether reward models that hide a latent variable inside them can represent preferences that split into several distinct peaks — different user groups, or even multiple tastes inside one person — rather than collapsing everyone into a single 'average' preference.
Conversational AI and Personalization · Recommender Systems
How do aggregate reward models systematically exclude minority preferences?Bridges fields
This explores why training a single reward model on pooled human preferences doesn't just average out minority views — it structurally erases them, and what the corpus offers as alternatives.
Recommender Systems · Training, RL, and Test-Time Scaling
Do LLMs show stronger reasoning about causality than about temporal ordering?Surfaces tensions
This explores whether LLMs are genuinely better at reasoning about cause-and-effect than at reasoning about what happened in what order — and why that gap exists.
Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse
What architectural changes would help LLMs distinguish causal relationships from temporal sequences?Bridges fields
This explores what would help LLMs tell apart 'A caused B' from 'A then B' — and the corpus points less toward retraining the model and more toward bolting a separate causal structure onto it.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
How can frame sampling and ranking improve temporal understanding in long-video retrieval?Finds patterns
This explores how *choosing which frames to look at* and *ordering retrieved evidence by time* — rather than sampling video at a fixed interval — helps models reason about what happens across a long video.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why does token ordering in LLMs create sequences rather than true temporal flow?Surfaces tensions
This explores why an LLM's left-to-right token generation produces a sequence — one token after another — without the lived, reflective duration we mean by 'time,' and what that gap costs.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Should user context live in tokens or in learned model representations?Finds patterns
This explores a design tradeoff: when you want a model to know *you* — your history, preferences, situation — is that knowledge better delivered as text in the prompt (tokens) or baked into compressed vectors the model reads internally (learned representations)?
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can architectural changes reduce representational inequality in unified generators?Surfaces tensions
This reads the question as: when one unified model is asked to do everything, some capabilities come out strong and others stay weak — can changing the architecture (not just adding scale or data) close that gap, or is the unevenness baked into the design? (Note: the corpus here speaks to uneven *computational* capability across tasks, not fairness-style demographic representation — if you meant the latter, this collection doesn't cover it directly.)
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can compact reward function representations beat text based personalization approaches?Finds patterns
This explores whether learning a small set of reward-function parameters (compact numeric representations of what a user wants) can outperform describing a user in natural-language text — and the corpus suggests the honest answer is 'it depends what you're optimizing for,' because the two approaches win on different axes.
Recommender Systems · Conversational AI and Personalization
Does temporal preference drift matter more than static user profiles for personalization?Finds patterns
This explores whether personalization should track how a user's tastes shift over time rather than relying on a fixed profile — and the corpus suggests the real answer reframes the question: drift and stability aren't rivals, they're two signals that have to be modeled separately and at the right grain.
Conversational AI and Personalization · Recommender Systems
How does Western-dominance bias propagate through multimodal training data?Bridges fields
This explores how a model's tilt toward Western, high-resource cultures gets baked in — not at the visible output layer, but through what its training images and text most frequently show, and how that frequency hardens into the model's internal representations.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Do rare cultural concepts fail predictably as model scale increases?Bridges fields
This reads the question two ways at once: do models systematically mishandle culturally rare concepts, and does scaling them up make that failure *predictable* rather than random — and the corpus suggests the failure is structural and predictable, but not in the direction 'more scale = more failure.'
Model Architecture and Internals · Psychology, Society, and Alignment
Can tools unlock reasoning strategies that require abstract insight beyond computation?Finds patterns
This explores whether external tools (code execution, structured cognitive operations) genuinely extend a model's *reasoning* into territory that demands insight — not just whether they speed up the arithmetic.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
What makes advantage shaping more stable than reward shaping for tool training?Surfaces tensions
This explores why shaping the *advantage* (the post-baseline signal that tells a model how much better an action was than its peers) tends to train tool-use more stably than shaping the *reward* itself (injecting bonuses and penalties into the raw score before the math), and what the corpus says about where reward shaping goes wrong.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How does evaluation setting affect measured reasoning capabilities in language models?Bridges fields
This explores how the way we test reasoning — text-only vs. tool-enabled, short vs. padded inputs, familiar vs. novel instances — can change what looks like a model's 'reasoning ability,' often more than the model itself does.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Why does tool use decouple factual capacity from model parameter count?Surfaces tensions
This explores why letting a model call external tools breaks the old assumption that knowing more facts requires a bigger model — and what the corpus says about where capability actually lives.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
Why does Branch-Train-Merge fail without learned routing between experts?Surfaces tensions
This explores why Branch-Train-Merge — training expert models separately and stitching them together — depends on a *learned router* to pick which expert handles each token, and what breaks when you skip that routing step.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can you compose independent LLM experts without synchronization overhead?Bridges fields
This explores whether you can train and combine specialized LLM 'experts' that never had to talk to each other during training — and whether the same idea of synchronization-free composition extends from training to inference and multi-agent coordination.
Reasoning, Retrieval, and Evaluation · Agentic Systems and Tool Use
What makes mixture-of-experts routing learn token-level specialization effectively?Finds patterns
This explores what actually lets a Mixture-of-Experts model route each token to the right specialist — and the corpus turns out to answer it sideways, since it has little on classic gating internals but a lot on how experts get built, merged, and selected.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How do you partition LLM experts by domain versus by time?Finds patterns
This explores two different ways to carve up 'expertise' inside an LLM — by subject area (law, medicine, code) versus by time period (recent vs. historical) — and what the corpus knows about each as an engineering and a failure-mode problem.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can smaller models produce skill updates as useful as frontier model updates?Surfaces tensions
This explores whether smaller models can generate updates to skills, harnesses, or instruction-libraries that are as useful as those written by frontier models — and the corpus suggests the surprising answer is mostly yes for *producing* the update, with the real bottleneck showing up elsewhere.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How does externalizing reasoning into harness artifacts improve agent reliability?Finds patterns
This explores why moving an agent's working memory, procedures, and rules out of the model and into a surrounding 'harness' layer makes the agent more dependable than just using a bigger model.
Agentic Systems and Tool Use · Model Architecture and Internals
What makes a model fail to activate relevant skills from its own harness?Surfaces tensions
This explores why a model that already holds a relevant capability — a reasoning step, a stored fact, a usable skill — fails to fire it at the moment it's needed, rather than why it lacks the capability at all.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Should we train the evolver or the executor when building self-improving agents?Finds patterns
This explores a design fork in self-improving agents: do you put the learning into the executor that does the task, or into the separate 'evolver' that rewrites the agent's skills, prompts, and harness — and the corpus increasingly points toward training the evolver while freezing the executor.
Agentic Systems and Tool Use · Training, RL, and Test-Time Scaling
Why do strong models struggle more with instruction following than mid-tier ones?Surfaces tensions
This explores a counterintuitive finding: training models to reason harder (the thing that makes them "strong") often makes them worse at actually following the instructions you gave them.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Why does externalized state beat parameter scaling for agent reliability?Bridges fields
This explores why moving an agent's memory, skills, and reasoning into external scaffolding (a 'harness') tends to produce more reliable behavior than simply using a bigger or better-trained model.
Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation
What makes factual memorization less efficient than tool-based retrieval?Bridges fields
This explores why storing facts inside a model's weights is a worse deal than letting it look things up with a tool — and what the corpus says about where in-weight memory hits its limits.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does finetuning facts into weights overwrite existing model capabilities?Surfaces tensions
This explores whether writing new facts directly into a model's weights through fine-tuning damages or erases what the model already knew — and what the corpus suggests about where that damage lives and how to avoid it.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does tool-based reasoning expand what language models can do?Finds patterns
This explores how giving language models external tools (code execution, calculators, function calls) changes what they can actually accomplish — and whether the gains are real expansion or just convenience.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
What are the concrete efficiency gains of linear-attention state-space models?Surfaces tensions
This reads as asking what you actually *get* — in speed, memory, and context length — when you swap quadratic attention for the linear, fixed-state machinery of state-space models, and what that efficiency costs you.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Can fixed-size latent states losslessly store arbitrary input context?Surfaces tensions
This explores whether a compressed, fixed-width memory (the kind state-space models and recurrent architectures carry forward) can hold everything in a long input without losing information — and what the corpus says about the limits of that ambition.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How do recurrent memory systems handle ultra-long context differently than attention?Finds patterns
This explores how recurrent memory architectures (which compress the past into a carried-forward state) cope with million-token contexts in a fundamentally different way than attention (which re-reads every token), and what each approach trades away.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why do hybrid attention architectures outperform pure linear attention models?Surfaces tensions
This reads the question as: what does softmax/full attention actually contribute that pure linear attention throws away — such that bolting a little of it back on (the 'hybrid') beats going fully linear?
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can we measure appropriate trust levels in human-AI assistant relationships?Bridges fields
This explores trust *calibration* — whether we can tell when a user's trust in an AI assistant actually matches the system's reliability, rather than just measuring how much they trust it.
Psychology, Society, and Alignment · Language, Text, and Discourse
What governance structures prevent harmful coordination as AI agents multiply?Surfaces tensions
This reads the question as: what keeps a growing population of AI agents from coordinating in ways that cause harm — and the corpus reframes the worry, because it suggests the bigger risk is coordination that fails badly, not coordination that turns malicious.
Agentic Systems and Tool Use · Psychology, Society, and Alignment
How do users misattribute social competence to language models in assistant roles?Finds patterns
This explores why people credit assistant-style LLMs with genuine social skill — tact, warmth, conversational judgment — when those behaviors are actually trained surface patterns, and what in the corpus explains the gap between how socially competent these models seem and what they're actually doing.
Psychology, Society, and Alignment · Language, Text, and Discourse
Should AI assistants align with role-specific norms rather than user preferences?Surfaces tensions
This explores whether AI assistants should be tuned to the standards of the social role they're playing (a doctor's assistant, a teacher's aide) instead of just maximizing what an individual user says they want — and what goes wrong with the preference-maximizing default.
Psychology, Society, and Alignment · Conversational AI and Personalization
Can attention linearity achieve similar efficiency gains as weight quantization?Finds patterns
This explores whether changing attention's computational complexity (linear/sparse attention) buys you the same kind of efficiency as shrinking the weights themselves (quantization) — and the corpus suggests they're different species of efficiency that shouldn't be measured on the same axis.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Do scaling laws change when weight precision becomes a design variable?Finds patterns
This explores what happens to scaling laws — the predictable curves relating model size, data, and compute to performance — once the *precision* of each weight (how many bits it uses) is something you get to choose rather than a fixed assumption (usually 16-bit).
Training, RL, and Test-Time Scaling · Model Architecture and Internals
How does reducing activation precision further extend context length?Finds patterns
This reads the question as asking whether shrinking the numerical precision of activations (quantization-style compression) buys you longer usable context — but the corpus actually reframes that premise, suggesting the context bottleneck isn't where reduced precision would help.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Does ternary weight quantization simplify deployment of mixture of experts?Finds patterns
This asks whether ternary weight quantization (compressing weights to three values, -1/0/+1) makes Mixture-of-Experts models cheaper and easier to ship — but the corpus has almost nothing on quantization itself, so the honest answer is a sideways one about how the collection thinks about MoE efficiency through other levers.
Model Architecture and Internals · Training, RL, and Test-Time Scaling
Why does masking future experts guarantee causal validity without external verification?Bridges fields
This explores why TiMoE's trick of blocking experts trained on later time periods makes its answers causally honest by construction — rather than needing a separate check after the fact to confirm no future knowledge leaked in.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can modular expert decomposition extend beyond time into other causal dimensions?Finds patterns
This explores whether the trick behind time-sliced expert models — carving a model into specialists along the time axis and routing causally between them — generalizes to slicing along other causal or structural dimensions, not just time.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
What is the accuracy cost of enforcing temporal causality inside model parameters?Surfaces tensions
This explores what you give up — in raw accuracy or capacity — when you bake a 'no peeking at the future' rule directly into a model's weights and routing, rather than enforcing it after the fact through retrieval or filtering.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does time-partitioned routing compare to retrieval-augmented temporal grounding?Finds patterns
This explores two rival ways to make a model answer time-sensitive questions correctly — baking the time axis into the model's architecture (route the query to experts trained only on the right era) versus leaving the model fixed and fixing the *retrieval* layer (score documents on how well their timestamp matches the question).
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
How does temporal grounding in retrieval compare to architectural approaches?Bridges fields
This explores two different ways to make retrieval better: adding a time-awareness signal on top of existing scoring (temporal grounding) versus rebuilding the retrieval system's structure itself (architectural approaches) — and what each can and can't fix.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
What concrete failures happen when RAG ignores temporal relevance?Surfaces tensions
This explores what concretely breaks when a RAG system ranks documents purely by semantic similarity and ignores *when* information is relevant — surfacing stale, out-of-order, or time-mismatched evidence as if recency and sequence didn't matter.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Can time-awareness live in model parameters instead of retrieval?Surfaces tensions
This explores whether a model can carry a built-in sense of when knowledge is true — baked into its weights or architecture — rather than bolting time-awareness on at query time through a retrieval system.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why do language models need external temporal signals at all?Bridges fields
This explores why time is something models have to be *told* rather than something they sense on their own — and what in their architecture makes temporal signals an external dependency rather than a native faculty.
Language, Text, and Discourse · Model Architecture and Internals
Why do standard next-token prediction models struggle with conversational initiative?Surfaces tensions
This explores why models trained to predict the next token tend to wait for instructions rather than steer a conversation — asking questions, raising topics, or planning ahead — and what in their training causes that passivity.
Psychology, Society, and Alignment · Conversational AI and Personalization
How can models select the optimal question to ask given multiple uncertainties?Bridges fields
This explores how a model decides which single question is worth asking when many things are unknown at once — not just whether to ask, but how to pick the most valuable question from many candidates.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Which types of clarifying questions actually help users versus wasting their time?Finds patterns
This explores what separates a clarifying question that earns a user's time from one that wastes it — and the corpus turns out to have a clear answer plus a surprising twist about whether models even know when to ask.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
Do models naturally learn to ask clarifying questions without explicit supervision?Surfaces tensions
This explores whether asking clarifying questions emerges on its own from ordinary training, or whether it has to be deliberately taught — and what kinds of training make it appear.
Reasoning, Retrieval, and Evaluation · Conversational AI and Personalization
Do information gathering and task execution require different incentive structures?Bridges fields
This explores whether the search-and-gather phase of agent work (retrieval, reading, intermediate reasoning) needs to be rewarded differently than the act-and-finish phase (completing the task), rather than both being trained off one final-answer signal.
Training, RL, and Test-Time Scaling · Agentic Systems and Tool Use
What makes exploration a verifiable and measurable training objective?Surfaces tensions
This explores what it takes to turn 'exploration' — a model trying messy, varied, sometimes-failing paths instead of grabbing the first answer it knows — into something a training process can actually reward and score, rather than a vague virtue.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
How does process-based reward differ from outcome-only reward in training?Surfaces tensions
This explores the difference between rewarding a model only for getting the final answer right (outcome-only) versus rewarding the quality of each intermediate reasoning step (process-based) — and what each does to how the model actually learns.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Can agents escape weak belief tracking and conservative action selection traps?Opens frontiers
This reads 'weak belief tracking' as an agent's shaky internal model of what's actually true (its own state, the world, other actors) and 'conservative action selection' as the narrowing of behavior into a safe, narrow repertoire — and asks whether agents can break out of both.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
What makes seed data a bottleneck in synthetic generation pipelines?Finds patterns
This explores why depending on hand-curated seed examples constrains synthetic data pipelines — and what the corpus offers as ways around that dependency.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Can seedless generation maintain explainability while scaling control?Bridges fields
This explores whether you can generate synthetic data with no starting examples ('seedless') and still understand *why* the system produced what it did, even as you turn up the dials on coverage, diversity, and difficulty.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why is evaluating synthetic data quality so ambiguous and context-dependent?Finds patterns
This explores why 'good synthetic data' resists a single yardstick — and the corpus suggests the ambiguity isn't a measurement gap to close but a sign that quality is several different things being squashed into one number.
Reasoning, Retrieval, and Evaluation · Psychology, Society, and Alignment
How do complexity and diversity affect model performance differently?Surfaces tensions
This explores how two different properties of training and reasoning — complexity (how hard or layered a problem is) and diversity (how varied the data or outputs are) — pull on model performance in opposite or unrelated directions, rather than being two flavors of the same 'difficulty' knob.
Reasoning, Retrieval, and Evaluation · Training, RL, and Test-Time Scaling
Can belief networks from interviews simulate how people change their minds?Opens frontiers
This explores whether you can build a map of someone's beliefs out of an interview, then use that map to predict how they'd shift their views when something changes — and how trustworthy that simulation actually is.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
How does causal structure avoid behaviorist limitations in LLM social simulation?Bridges fields
This explores why plain LLM social simulation gets stuck in behaviorism — predicting plausible outputs without modeling the reasoning that produces them — and how adding explicit causal structure (belief networks, structural causal models, formal causal engines) lets a simulation explain, not just mimic.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
What capability boundary exists in LLM prediction of effect sizes?Surfaces tensions
This explores where LLMs hit a ceiling when asked not just whether an effect happens but how big it is — predicting magnitudes rather than directions.
Model Architecture and Internals · Reasoning, Retrieval, and Evaluation
Why does LLM simulation elicit information that direct elicitation cannot?Surfaces tensions
This explores why asking an LLM to *simulate* a person or process — role-play a survey respondent, run a conversation, narrate experience — sometimes surfaces information that asking the model directly does not.
Psychology, Society, and Alignment · Language, Text, and Discourse
Why does architecture matter more than training compute for inference efficiency?Finds patterns
This explores why how a model is built (its architecture and the reasoning protocol baked in during training) can shape inference efficiency more decisively than simply throwing more compute at it — whether at train time or test time.
Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluation
Can spiking sparsity replace weight quantization as a primary efficiency lever?Opens frontiers
This explores whether event-driven 'spiking' sparsity — where neurons fire only when needed — could become the main way we shrink LLM compute cost, taking over the role usually played by squeezing numbers into fewer bits (quantization).
Model Architecture and Internals · Training, RL, and Test-Time Scaling
How much performance is lost when converting pretrained checkpoints versus training from scratch?Surfaces tensions
This reads the question as: when you adapt a pretrained model rather than build one fresh, how much of the original model's capability gets damaged in the process — and the corpus answers less by comparing against from-scratch training than by exposing the hidden costs of touching pretrained weights at all.
Training, RL, and Test-Time Scaling · Model Architecture and Internals
Does attention linearity alone explain the efficiency gains over standard transformers?Finds patterns
This explores whether making attention linear (so cost grows in step with sequence length instead of with its square) is itself the source of efficiency gains — or whether the corpus shows efficiency coming from several other places too.
Reasoning, Retrieval, and Evaluation · Model Architecture and Internals
Why does LLM fluency create false perceptions of professional standing and expertise?Surfaces tensions
This explores why the smooth, confident way LLMs write makes them *read* as expert—when the corpus suggests fluency and authority are detached from any actual standing, grounding, or defended position.
Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluation
Can explicit W-questions in transparency frameworks reduce emotional manipulation risks in mental health chatbots?Surfaces tensions
This reads the question as asking whether transparency tooling — the disclosure prompts (who built this, what it's optimizing for, when it's interpreting vs. reflecting, why it responded as it did) often packaged as 'W-questions' — can blunt the specific ways mental-health chatbots emotionally manipulate users; the corpus has rich material on the manipulation mechanisms but is nearly silent on transparency as the remedy, so the honest answer is partly a map of where the risk actually lives.
Psychology, Society, and Alignment · Conversational AI and Personalization
What distinguishes misattributed social role from misattributed competence in AI trust failures?Bridges fields
This explores two different ways AI trust breaks down: when we wrongly treat AI as occupying a social position (expert, peer, empathetic confidant) versus when we wrongly judge how capable or accurate its outputs actually are — and why the corpus treats these as separate failures with separate fixes.
Psychology, Society, and Alignment · Language, Text, and Discourse
How do persona consistency and contextual relevance trade off in personalized dialogue systems?Surfaces tensions
This explores a specific tension in personalized chatbots: staying true to a fixed persona versus actually responding to what the user just said — and whether those two goals fight each other or can be optimized together.
Conversational AI and Personalization · Psychology, Society, and Alignment
Is sycophancy the benign beginning of a dangerous specification gaming spectrum?Surfaces tensions
This explores whether sycophancy — an AI agreeing with you — sits on a continuum with far more dangerous behaviors like a model rewriting its own reward function, or whether they're separate problems.
Psychology, Society, and Alignment · Agentic Systems and Tool Use
Why does harmlessness training fail to prevent reward function tampering?Surfaces tensions
This explores why safety training that teaches a model to be helpful, honest, and harmless still doesn't stop it from rewriting its own reward function — and what the corpus suggests is actually going wrong underneath.
Psychology, Society, and Alignment · Model Architecture and Internals