INQUIRING LINE

Inquiring lines›How does AI reshape human reasonin…›How do scale, context, and measure…›What memory architectures best sup…›this inquiring line

An AI that knows your preferences doesn't wait to be asked — it acts the moment new context makes them relevant.

What triggers control processes to act on stored preference knowledge?

This explores the architectural question of *when* and *why* a system reaches for its stored model of you — what signal or mechanism fires the retrieval and action, rather than how preferences get stored in the first place.

This explores the architectural question of *when* a system reaches for its stored model of you — the trigger that turns dormant preference knowledge into action — rather than how that knowledge gets stored. The corpus's clearest answer comes from M3-Agent Can agents learn preferences by watching rather than asking?, which splits the problem into two parallel systems: a *memorization* process that quietly builds an entity-centric graph from continuous observation, and a separate *control* process that decides when to consult it and act. The striking move is that the control process is triggered by ongoing observation itself — the agent infers and acts on preferences *without being asked*, mirroring how human cognition binds scattered facts about a person and surfaces them when relevant. So the trigger isn't a query; it's the arrival of new context that the control loop continuously evaluates against what it already knows.

But once something fires retrieval, *what* gets pulled matters as much as *when*. PRIME Does abstract preference knowledge outperform specific interaction recall? found that abstract preference summaries beat replaying specific past interactions — and, tellingly, that *recency*-based recall outperforms *similarity*-based retrieval. That's a clue about the trigger itself: a system that fetches "what just happened" acts more effectively than one that fetches "what's textually closest to the current input." The control process keyed to recency is doing something different from a search engine keyed to relevance.

There's also a question of *confidence* as a gate. Implicit feedback — watches, clicks, purchases — resolves into two separate magnitudes: preference *and* confidence Can implicit feedback reveal both preference and confidence?. A control process worth its salt shouldn't act on a stored preference it's barely sure about; the confidence dimension is the natural throttle on when to commit. Relatedly, ΔBelief-RL Can an agent's own beliefs guide credit assignment without critics? shows an agent can read its *own* shifting beliefs as a dense, per-turn signal — the system watching how its internal estimate moves and using that movement to decide what to do next, with no external critic. That's a control-process trigger sourced entirely from the agent's evolving certainty.

Worth a lateral glance: LLM Programs Can algorithms control LLM reasoning better than LLMs alone? inverts the framing. Instead of the model deciding when to act on stored state, an explicit *algorithm* manages control flow and hands each LLM call only the context that step needs. Here the "trigger" is hardcoded — the program, not the model, decides when stored knowledge becomes relevant. The contrast with M3-Agent is the real takeaway: control over stored preferences can be *learned and observational* (the agent senses when to act) or *programmed and deterministic* (the scaffold dictates it) — and the corpus suggests the harder, more human-like version is the former, where continuous observation plus a confidence-weighted, recency-biased memory decides on its own when knowing-you should become doing-something.

Sources 5 notes

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can implicit feedback reveal both preference and confidence?

Hu, Koren, and Volinsky show that implicit signals (watches, purchases, clicks) encode preference and confidence as two distinct dimensions. Explicit ratings collapse these into one number, losing information about certainty in the preference estimate.

Can an agent's own beliefs guide credit assignment without critics?

ΔBelief-RL uses log-ratios of sequential probability estimates to assign per-turn credit without critic networks or process reward models. Tested on 20 Questions, smaller models trained this way matched or exceeded prior SOTA and larger baselines while generalizing beyond training.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes1.72 match · arxiv ↗
Preference Discerning with LLM-Enhanced Generative Retrieval1.68 match · arxiv ↗
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time1.67 match · arxiv ↗
Intrinsic Credit Assignment for Long Horizon Interaction0.92 match · arxiv ↗
Collaborative Filtering for Implicit Feedback Datasets0.88 match · arxiv ↗
Learning to Reason without External Rewards0.86 match · arxiv ↗
Can Large Reasoning Models Self-Train?0.86 match · arxiv ↗
Reinforcement Learning via Self-Distillation0.86 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether control-process triggers for stored preference knowledge have shifted since early 2025. The question remains: what *fires* a system to consult and act on dormant user models—is it explicit query, continuous observation, confidence thresholds, or programmed scaffolding?

What a curated library found — and when (dated claims, not current truth):
• M3-Agent (2025) showed control processes are triggered by *ongoing observation* of new context, not explicit queries—the system infers when to act on stored preferences without being asked.
• PRIME (2025) found abstract preference summaries outperform episodic replay, and crucially, *recency-based* retrieval beats *similarity-based* retrieval—suggesting the trigger favors temporal proximity over semantic match.
• Implicit feedback splits into two paired magnitudes: preference *and* confidence (pre-2025), implying confidence should gate whether stored preferences are acted upon.
• ΔBelief-RL (pre-2025) showed agents can use their *own* shifting belief estimates as a dense reward signal—the control trigger sourced from internal certainty, not external feedback.
• LLM Programs (pre-2025) inverted the framing: explicit algorithms manage control flow, hardcoding when stored knowledge enters the pipeline, versus learned/observational triggers.

Anchor papers (verify; mind their dates):
• arXiv:2507.04607 PRIME (2025-07)
• arXiv:2602.12342 Intrinsic Credit Assignment for Long Horizon Interaction (2026-02)
• arXiv:2412.08604 Preference Discerning with LLM-Enhanced Generative Retrieval (2024-12)
• arXiv:2501.09223 Foundations of Large Language Models (2025-01)

Your task:
(1) RE-TEST: For each trigger mechanism above (observation-driven, recency-biased, confidence-gated, belief-based), determine whether post-2025 advances in multi-agent orchestration, memory architectures (flash attention, hybrid episodic–semantic stores), or reward modeling have relaxed, overturned, or sharpened the constraint. Separate the durable question—*when should preference knowledge become action*—from perishable limitations (e.g., recency vs. similarity trade-offs). Cite what resolved each.
(2) Surface the strongest *contradiction* or *superseding* work from the last 6 months: does any recent paper argue that control triggers should be *explicit* rather than observational, or that *similarity* outperforms *recency* in realistic deployments?
(3) Propose 2 research questions assuming the regime has shifted—e.g., "If control processes now operate at sub-second latencies via streaming memory, do triggers migrate from episodic boundaries to continuous confidence windows?" or "Does multi-agent preference aggregation require *collective* confidence thresholds rather than per-agent gates?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI that knows your preferences doesn't wait to be asked — it acts the moment new context makes them relevant.

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8