INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do prompts and framing affect…›How faithfully do LLMs reflect the…›this inquiring line

AI can pivot to question its own starting assumptions, but almost never does so unprompted — even when those assumptions are wrong.

Can LLMs propose pivots that change what counts as background context?

This explores whether an LLM can reframe a problem on its own — promoting something it had treated as background into the foreground (or demoting the reverse) — rather than just answering within the frame it was handed.

This reads the question as: can a model decide, unprompted, that the assumptions sitting quietly in the background are actually the thing that matters — and pivot? The corpus is fairly blunt here: LLMs can *execute* such a pivot when something else forces it, but they rarely *propose* it. The sharpest evidence is the frame problem note Do language models fail at identifying unstated preconditions?, which finds that models fail not from missing world knowledge but from not bringing unstated background conditions forward as relevant constraints. When you force explicit enumeration of those preconditions, accuracy jumps from 30% to 85%. The capacity to surface background is latent — but the model doesn't reach for it on its own. The reframing has to be triggered.

What blocks the spontaneous pivot looks structural. The presupposition note Why do embedding contexts confuse LLM entailment predictions? shows LLMs treat the very cues that *should* shift what's foreground (a non-factive verb that cancels an entailment) as surface patterns rather than recomputing the frame. And Why do language models ignore information in their context? shows that when training priors are strong, parametric knowledge simply overrides the present context — so new information can't redefine what counts as relevant, even when it should. A model that can't let current context overrule its priors is a model that can't repromote background to foreground.

There's also a memory-architecture ceiling. How do LLMs balance remembering context versus keeping it separate? argues LLMs process everything as one undifferentiated token string with no compartmentalized memory — so they can't hold two competing frames side by side and choose between them the way a person reconsidering their assumptions would. Pivoting *what counts as* context presupposes you can tell figure from ground; the architecture blurs that line by default.

Where the pivot actually happens, it comes from outside the model. Can algorithms control LLM reasoning better than LLMs alone? shows that deciding what context each reasoning step sees — hiding the irrelevant, surfacing the relevant — is done by an explicit algorithm wrapping the LLM, not by the LLM electing to reframe. The reframing is exogenous scaffolding. Even the seemingly trivial skill of deciding what to *ignore* turns out to be a trainable gap, not an emergent one: Why do language models engage with conversational distractors? fixes topical resilience with ~1,000 synthetic dialogues, noting models learn 'what to do' instructions but not 'what to ignore' instructions — and what-to-ignore is precisely the act of assigning something to the background.

The thing you might not have expected: across these notes, 'what counts as background' is not a fact the model reads off the world — it's a *decision* that currently lives in the prompt, the algorithm, or the fine-tuning signal, not in the model's own initiative. So a pivot is achievable, but today it's authored by whoever builds the scaffolding around the LLM. The open question the corpus quietly raises is whether forced-enumeration and what-to-ignore training are scaffolds toward a model that eventually proposes its own pivots — or permanent crutches for something the architecture can't do unaided.

Sources 6 notes

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

How do LLMs balance remembering context versus keeping it separate?

Because LLMs process conversation as a single token string without compartmentalized memory, they cannot maintain separate contexts the way humans do. Existing mitigations like compression, longer windows, and retrieval all introduce new failure modes and cannot replicate human compartmentalization.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Show all 6 sources

Why do language models engage with conversational distractors?

Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds2.60 match · arxiv ↗
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners1.68 match · arxiv ↗
LLMs Get Lost In Multi-Turn Conversation1.64 match · arxiv ↗
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues0.88 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context0.88 match · arxiv ↗
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey0.87 match · arxiv ↗
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?0.86 match · arxiv ↗
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation0.86 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question: **Can LLMs propose pivots that change what counts as background context — unprompted, as an autonomous reframing move?** This remains open. A curated library of LLM research (2023–2026) found:

**What a curated library found — and when (dated claims, not current truth):**
Findings span 2023–2026; treat these as perishable constraints to re-test.
- LLMs execute pivots when forced (e.g., explicit enumeration of preconditions lifts accuracy from 30% → 85%), but rarely *propose* them autonomously (~2024).
- Non-factive verb cues and presupposition triggers that should signal frame-shifts are treated as surface patterns; models don't recompute the frame (~2024).
- Strong training priors override current context, blocking new information from redefining relevance (~2024).
- Models lack compartmentalized memory to hold competing frames side-by-side; they process tokens as one undifferentiated string (~2024).
- Deciding what-to-ignore (assigning to background) is trainable via ~1,000 synthetic dialogues, not emergent; this "what-to-ignore" gap persists even in SoTA (~2024).
- Reframing today lives in prompt scaffolding, algorithms, and fine-tuning signals — not in the model's own initiative (~2024–2025).

**Anchor papers (verify; mind their dates):**
- arXiv:2305.14785 (2023) — frame problem and enumeration failure
- arXiv:2404.03820 (2024) — topic-following and what-to-ignore instruction gap
- arXiv:2510.20941 (2025) — precedent overruling and context recomputation
- arXiv:2512.24601 (2025) — recursive language models

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, ask: have newer models (Claude 4+, o3, Grok-3, or equivalents), better training methods (constitutional AI, process supervision, hidden scratchpads), tooling (adaptive context windows, memory augmentation), orchestration (multi-step reasoning with explicit frame-switching), or evals (tests of frame-pivot autonomy) since late 2025 *relaxed* or *overturned* any of these? Separate durable question (can models decide autonomously?) from perishable limitation (models can't hold competing frames). Cite what resolved it; say plainly where a constraint still holds.
(2) **Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Has recursive language modeling, agentic frameworks, or breakthroughs in in-context learning changed the answer?
(3) **Propose 2 new research questions that ASSUME the regime may have shifted:** e.g., *Do models fine-tuned with explicit frame-switching objectives spontaneously pivot?* or *Can multi-agent setups (one model as proposer, one as critic of frames) reach pivot autonomy?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI can pivot to question its own starting assumptions, but almost never does so unprompted — even when those assumptions are wrong.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8