SYNTHESIS NOTE

Can embedding future information in training data improve planning?

This explores whether inserting lookahead tokens containing future goals into training sequences helps models learn long-range planning without changing their architecture. The question matters because it tests whether data-level changes can produce architectural-level reasoning improvements.

Synthesis note · 2026-02-22 · sourced from LLM Architecture

TRELAWNEY (2504.11336) identifies a structural mismatch in causal language model training: each token is predicted from previous context, but in human writing and reasoning, goals are typically known before exact arguments or phrasings. Teacher forcing compounds this — it accelerates training by providing correct previous output, but models trained this way latch onto local patterns and surface-level correlations rather than learning long-range dependencies.

The fix is data-centric rather than architectural. TRELAWNEY augments training data by interleaving special lookahead tokens (<T> and </T>) that encapsulate future information. The placement and content of these tokens can be random or task-specific. The model learns from modified training data using the standard training infrastructure — no architecture changes, no additional training tricks.

The results span planning, algorithmic reasoning, and story generation. The model's goal generation capability — a natural byproduct of the training augmentation — can further improve planning and reasoning when used at inference time. This training-time goal conditioning is the complement of Does planning direction affect how hard problems become?, which provides goal information at inference time by reversing search direction — TRELAWNEY internalizes backward planning's benefits during training.

This is a different intervention than multi-token prediction (Bachmann & Nagarajan, 2024; Gloeckle et al., 2024), which forces simultaneous prediction of multiple future tokens. Multi-token prediction modifies the training objective and often the architecture. TRELAWNEY modifies only the training data, making it compatible with existing infrastructure and scalable to any model size.

Since Does training data format shape reasoning strategy more than domain?, TRELAWNEY is evidence that format intervention at the training data level can have architectural-level effects. The lookahead tokens create a new "format" that teaches the model to condition generation on future goals — changing its reasoning strategy from purely autoregressive to goal-conditioned.

The connection to Can backward reasoning during training improve forward reasoning? is complementary: backward reasoning provides consistency checking from the end state, while lookahead tokens provide goal information from the future. Both address the forward-only limitation of standard NTP from different angles.

Inquiring lines that read this note 24

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can self-supervised signals enable process supervision without human annotation?

Can explicit goal state scaffolding at inference time transfer to autonomous tracking through training?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

What distinguishes planning knowledge from an executable plan that works?

How should memory consolidation strategies shape agent performance over time?

What memory and planning capabilities do AI companions need for evolving user needs?

Can next-token prediction alone produce genuine language understanding?

What structural advantages do diffusion language models offer over autoregressive methods?

What memory architectures best support persistent reasoning across extended interactions?

Can episodic and semantic memory improve long-horizon task reasoning?

What makes weaker teacher models effective for stronger student training?

How can weak-to-strong progressive training target planning without interfering with grounding?

Why do reward structures fail to shape long-term agent learning?

Can architectural changes like decoupling intent understanding help overcome next-turn reward limitations?

Does decoupling planning from execution improve multi-step reasoning accuracy?

How does latent reasoning compare to verbalized chain-of-thought?

What capability tradeoffs emerge when scaling model reasoning abilities?

Can training models on backward reasoning improve their forward planning ability?

What pretraining choices and baseline capability constrain reinforcement learning gains?

How do training priors constrain what context information can override?

Can goal information injected at inference time replace goal-conditioned training?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

What data properties enable transformers to learn sequential decision-making in context?

Do language models develop causal world models or rely on statistical patterns?

Does next-state prediction alone build mechanistic world models or just sophisticated interpolation?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 136 in 2-hop network ·dense cluster Open in graph ↗

Can embedding future information in training dat… Does training data format shape reasoning strategy… Can backward reasoning during training improve for… Can training data augmentation match test-time com… Does planning direction affect how hard problems b… Which sentences actually steer a reasoning trace?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does training data format shape reasoning strategy more than domain? What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
data-level format intervention with architectural-level effects
Can backward reasoning during training improve forward reasoning? Does training models to reason backward—generating inverse questions and solutions—build internal consistency checking that transfers to forward-only inference? This explores whether backward capacity internalized during training without test-time deployment can enhance reasoning quality.
complementary future-information injection
Can training data augmentation match test-time compute scaling benefits? Can generating thinking trajectories during pretraining unlock the same efficiency gains that test-time scaling provides at inference? This explores whether the compute-allocation principle works across the training-inference boundary.
both are data-centric training augmentations
Does planning direction affect how hard problems become? Planning research typically goes forward only. But some problems get easier when you work backward from the goal. What makes direction matter, and can language models exploit this?
both address the forward-only limitation of autoregressive generation: TRELAWNEY injects goal/future information into training data so the model learns to condition on goals, while backward planning reverses the search direction at inference time; TRELAWNEY could be seen as training the model to internalize the benefits backward planning provides at test time
Which sentences actually steer a reasoning trace? Can we identify which sentences in a reasoning trace have outsized influence on the final answer? Three independent methods converge on a surprising answer about planning and backtracking.
thought anchors (especially planning sentences) may be the behavioral manifestation of TRELAWNEY-like goal conditioning: the model generates planning sentences that function as self-imposed lookahead tokens, conditioning subsequent generation on anticipated goals

Can embedding future information in training data improve planning?

Inquiring lines that read this note 24

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4