From the Archive — 2026-05-28 · 2026-05-28

From Simulation to Enaction: Post-trained language models recognize and react to their own generations

Asvin G., Jack Lindsey · arXiv:2605.25459

This work joins a growing conversation about internal mechanisms that track what models know, but shifts focus from passive self-knowledge toward something more active: the recognition of being in a consequential loop where outputs fold back as inputs. The distinction between implicit and explicit recognition—that models can feel their on-policy state without being able to reliably articulate it—echoes recent findings that behavioral awareness can emerge without explicit training to introspect, suggesting multiple routes to self-modeling in neural networks. What remains unclear is whether this on-policy awareness is truly causal to behavior change or merely a downstream artifact of the reward signals that post-training introduces—and whether models might encode this state-awareness through unrelated data channels we haven't yet mapped.

Adjacent research

Explore →

Do models know what they don't know? Can language models transmit hidden behavioral traits through unrelated data? Can language models describe their own learned behaviors?

Go deeper into LLM Reasoning and Architecture→

Foundation Protocol: A Coordination Layer for Agentic Society

Bang Liu, Yongfeng Gu, Jiayi Zhang, et al. · arXiv:2605.23218

As autonomous agents move from isolated tools toward coordinated social infrastructure—browsing, purchasing, deploying, and transacting with one another—the research conversation has begun to surface a critical shift: raw model capability matters less than whether agents can reliably work together, exchange value, and remain accountable. Recent work has documented how coordination failures emerge predictably in multi-agent systems, and how safety protocols for identity, authorization, and oversight need standardization rather than model-level fixes. This paper argues for a graph-first coordination layer that treats policy, provenance, and audit as first-class concerns—but the deeper tension it raises is whether the infrastructure should be minimally prescriptive (wrapping existing protocols) or more structural: given that standardized artifacts outperform natural language coordination, how much constraint should a coordination layer impose to remain genuinely "pluralistic" while keeping accountability non-negotiable?

Adjacent research

Explore →

What makes delegation work beyond just splitting tasks? Does structured artifact sharing outperform conversational coordination? What security protocols do autonomous agents actually need?

Go deeper into Agentic and Multi-Agent Systems→

Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye, Wentao Shi, Yuxin Liu, et al. · arXiv:2605.16143

The tension between acting and observing in unfamiliar settings sits at the heart of adaptive agency: recent work has documented how LLMs struggle with exploration in sequential decision-making despite their reasoning prowess, and this paper frames that struggle as a fundamental training problem rather than an inference-time limitation. Interestingly, the proposed Explore-then-Act paradigm echoes a broader insight—that interaction steps scale differently from reasoning depth, suggesting agents need distinct budgets for information-gathering versus task execution. What remains unclear is whether decoupling exploration from task execution truly solves the problem or simply postpones it: if RL-trained agents exhibit self-locking patterns that prevent effective information-seeking even when trained to explore, does a separate exploration phase protect against that drift, or do agents risk reverting to narrow behaviors once task pressure returns?

Adjacent research

Explore →

Can agent deployment itself generate training signals automatically? Why do LLMs struggle with exploration in simple decision tasks? Does agent interaction time scale separately from reasoning depth?

Go deeper into Agentic and Multi-Agent Systems→

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Jiaqi Liu, Shi Qiu, Mairui Li, et al. · arXiv:2605.20025

AutoResearchClaw arrives in a moment when researchers are pushing beyond the linear "prompt → execute → stop" model of autonomous science toward systems that learn from failure and iterate like humans do. The paper's core insight—that research domains require specific structural properties to benefit from automation—echoes across recent work questioning which scientific tasks actually suit autonomous optimization. What's particularly interesting is the paper's claim that specialized agents debating hypotheses outperform single-agent reasoning, yet the human-in-the-loop results suggest the real win comes from knowing *where* to intervene rather than intervening everywhere. This raises a subtle tension: if the system's performance hinges on precisely-timed human feedback at high-leverage moments, are we automating research discovery or automating the task of *deciding when to ask for help*? And if the latter, what does that suggest about whether these systems genuinely self-improve or merely execute human-designed metacognitive loops?

Adjacent research

Explore →

Can autonomous research pipelines discover AI architectures that AutoML cannot? What makes a research domain suitable for autonomous optimization? Can computational power accelerate scientific discovery itself?

Go deeper into Agentic and Multi-Agent Systems→

Post-training makes large language models less human-like

Marcel Binz, Elif Akata, Abdullah Almaatouq, et al. · arXiv:2605.07632

A growing tension has emerged in how we use large language models as behavioral proxies: the same finetuning processes that enable LLMs to predict human decision-making appear to push them further from authentic human-like responses, suggesting that alignment and accuracy toward human behavior may be orthogonal objectives. This finding echoes a broader pattern in machine learning where models can optimize for downstream performance metrics while drifting from the phenomena they're meant to model—not unlike how language models may solve benchmarks through surface patterns rather than structural understanding. Interestingly, recent work on interview-based agent modeling hints that richer, more individualized training data might partially recover behavioral alignment, yet the present results on persona-induction suggest this remains an open challenge. The question then becomes whether we need fundamentally different training paradigms to preserve human-likeness, or whether using LLMs as behavioral models simply requires accepting this trade-off as inherent to the technology.

Adjacent research

Explore →

Can language models learn to model human decision making? Can AI learn social norms better than humans? Can models pass tests while missing the actual grammar?

Go deeper into Language Understanding and Pragmatics→