INQUIRING LINE

How should headers index procedural intent differently from keyword chunking?

This explores how a document's headers should capture the *purpose* behind a procedure—what a step is for and what it depends on—rather than serving as keyword-matched cut points the way fixed-size chunking does.


This explores how headers should index *procedural intent*—what a step accomplishes and what it presupposes—rather than acting as keyword landmarks the way chunk-based retrieval does. The corpus is unusually direct on this: fixed-size chunking optimizes for surface keyword overlap, which systematically severs the dependencies that make a how-to procedure work. The most pointed answer comes from THREAD's four-part logic units How do logic units preserve procedural coherence better than chunks?, where a header is no longer just a topic label but one slot in a structure—prerequisite, header, body, linker. The header names the step, the prerequisite encodes what must already be true, and the linker explicitly routes to the next step or branch. That's the difference in a nutshell: keyword chunking asks "which passage looks similar to the query," while an intent-indexed header asks "what does this step do, and what does it require to fire."

The deeper reason this matters shows up when you look at how procedures get *indexed* elsewhere in the collection. PRAXIS finds that indexing web-agent procedures by environment *state* and local action pairs beats higher-level workflow abstractions, precisely because the click-by-click specifics—the conditions under which an action is valid—are what carry the reliability Does state-indexed memory outperform high-level workflow memory for web agents?. State is the runtime cousin of a prerequisite: both say "this only applies here." Keyword chunking erases that conditionality; intent-aware headers preserve it.

There's a useful tension worth surfacing, though. Agent Workflow Memory argues for indexing at the *sub-task routine* level—abstracting away example-specific values so a step can be reused across contexts, yielding large gains as train-test gaps widen Can agents learn reusable sub-task routines from past experience?. PRAXIS pushes the opposite way, toward concrete state. So "procedural intent" isn't one altitude: a good header has to name the reusable purpose while still pointing at the conditions that bind it. The art is holding both.

A cross-domain framing sharpens why headers-as-intent even works: LLM Programs treat a complex task as explicit algorithmic control flow that hands each model call only its step-relevant context, hiding everything else Can algorithms control LLM reasoning better than LLMs alone?. A header that indexes intent is doing the same information-hiding job at retrieval time—it lets the system pull *just* the step that matters and the link to the next, instead of dumping a similarity-ranked soup of passages. Rasa's reframing of dialogue understanding as generating commands rather than classifying intents makes a parallel move Can command generation replace intent classification in dialogue systems?: it treats meaning as pragmatics (what action is wanted) rather than semantics (what words match), which is exactly the shift from keyword chunking to intent indexing.

The thing you might not have expected to learn: across these notes, the real failure of keyword chunking isn't that it retrieves the wrong topic—it usually gets the topic right. It's that it loses the *edges*—the prerequisites, the state conditions, the links between steps—and procedures live entirely in their edges. An intent-indexing header is really a way of storing the graph, not just the nodes.


Sources 5 notes

How do logic units preserve procedural coherence better than chunks?

THREAD replaces chunks with four-part logic units—prerequisite, header, body, linker—enabling dynamic multi-step retrieval for how-to questions. Linkers explicitly navigate between steps and branches, addressing both the semantic-vs-task-relevance gap in embeddings and the sequential dependency loss in chunk-based RAG.

Does state-indexed memory outperform high-level workflow memory for web agents?

PRAXIS shows that indexing procedures by environment state and local action pairs yields consistent accuracy and reliability gains across VLM backbones on the REAL benchmark, compared to higher-level workflow abstractions that lose click-by-click specifics.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Can command generation replace intent classification in dialogue systems?

Rasa's dialogue understanding architecture generates domain-specific commands instead of classifying intents, eliminating annotation requirements, handling context naturally, and scaling without degradation—treating understanding as pragmatics rather than semantics.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a retrieval-systems researcher re-examining procedural indexing in 2026. The open question: should headers index procedural intent (prerequisites, state conditions, action effects) rather than surface keywords?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026. Core claims:
- Fixed-size keyword chunking severs inter-step dependencies; procedures live in their *edges* (prerequisites, state conditions, linkers), not nodes (THREAD, 2024-06).
- State-dependent indexing (action + environment state) outperforms workflow-level abstraction for web-agent procedures; runtime state is the operative cousin of procedural prerequisites (implied in PRAXIS-like work, ~2024).
- Sub-task routine abstraction (reusable, context-independent steps) compounds gains across train-test gaps, but tensionally competes with concrete state binding (Agent Workflow Memory, 2024-09).
- LLM Programs decompose tasks into step-specific prompts with explicit control flow, hiding irrelevant context—intent-indexed headers replicate this information-hiding at retrieval time (2024–2025).
- Intent mismatch in multi-turn conversation causes LLM drift; intent-aware indexing may mitigate (2026-02).

Anchor papers (verify; mind their dates):
- THREAD (arXiv:2406.13372, 2024-06): logic units with prerequisite-header-body-linker structure.
- Agent Workflow Memory (arXiv:2409.07429, 2024-09): sub-task routine abstraction.
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation (arXiv:2602.07338, 2026-02).
- Real-Time Procedural Learning From Experience (arXiv:2511.22074, 2025-11).

Your task:
(1) RE-TEST EACH CONSTRAINT. Has keyword chunking been *relaxed* by: improved retrievers (semantic, sparse-dense hybrids), hierarchical or graph-based indexing, agentic re-planning, or multimodal state representation? Has the tension between concrete state and reusable abstraction been resolved by meta-learning, in-context instruction, or dynamic routing? Plainly separate durable question (procedural intent is still underindexed) from perishable limitation (keyword chunking cannot encode it). Cite what has moved.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months: has multi-agent orchestration, memory caching, or tool-use frameworks (Anthropic Claude, OpenAI structured outputs) made intent-aware header indexing less critical or more urgent?
(3) Propose 2 research questions that ASSUME the retrieval regime has shifted: (a) How does intent-indexed retrieval interact with in-context learning when model capacity is abundant? (b) Can procedural intent be *learned* from agent rollouts rather than manually structured?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines