SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

Can routing mask future experts to prevent knowledge leakage?

Can models be built so that they respect query timestamps by selectively silencing experts trained on future data? This explores whether temporal causality can be enforced through architecture rather than external retrieval.

Synthesis note · 2026-06-03 · sourced from Test Time Compute

LLMs trained on a fixed web snapshot go stale and, worse, risk temporal leakage — answering as if they know information that postdates a query. Standard pretraining merges all time periods indiscriminately, so the model has no principled way to respect a query's timestamp. TiMoE makes temporal grounding architectural: pre-train a set of GPT-style experts on disjoint two-year slices of a 2013–2024 corpus, then at inference mask every expert whose training window ends after the query timestamp and merge the remaining experts' log-probabilities in a shared space. This guarantees strict causal validity while retaining multi-period breadth.

The result quantifies the trade: on the new 10k-question TSQA benchmark (alternatives labelled past/future/irrelevant), TiMoE cuts future-knowledge errors by up to ~15% and delivers steadier accuracy across years, at a "manageable cost of time-awareness" — a slight underperformance on eight standard NLP tasks rather than a fundamental barrier. The keeper is the design principle: temporal causality can be enforced by routing over time-partitioned parameters, not only by external retrieval or post-hoc verification.

This sits alongside retrieval-time and prompt-time temporal fixes as the parametric option. It complements Does AI text generation unfold through temporal reflection? (the RAG route to temporal grounding) by pushing the same concern into the model's own expert structure, trading some general accuracy for guaranteed causal validity.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 177 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

temporal grounding can be architectural — time-sliced experts with causal routing that masks future experts eliminate future-knowledge leakage