What are the ten intrinsic motivation heuristics that drive participation decisions?
This explores the ten 'motivation heuristics' from the Inner Thoughts framework — the cognitive-psychology-derived criteria an AI uses to decide whether it has something worth saying in a conversation — and how that idea of intrinsic, internally-generated motivation shows up elsewhere in the corpus.
This explores the ten motivation heuristics that sit at the heart of the Inner Thoughts framework — the system's way of deciding whether an AI agent should jump into a conversation at all. The honest answer up front: the corpus names that there *are* ten heuristics but doesn't enumerate them one by one. What it tells you is more interesting than a list. Inner Thoughts borrows from cognitive psychology and 'think-aloud' studies — the technique where people narrate their reasoning out loud — to generate covert thoughts running in parallel to the conversation, then scores those thoughts against the ten heuristics to judge whether the agent has something genuinely worth contributing Can AI agents learn when they have something worth saying?. The heuristics are the filter between 'I could speak' and 'I should speak.' That reframing — modeling an internal urge to participate rather than predicting whose turn is next — is why participants preferred it 82% of the time over a next-speaker-prediction baseline.
What makes this an Inquiring Line worth pulling is that 'intrinsic motivation' turns out to be a recurring move across the collection, each time solving a different problem. The deepest cousin is belief-shift as intrinsic reward: instead of an external critic telling an agent how it did, the agent measures how much its own confidence in the right answer moved after each turn, and treats that internal swing as the reward signal Can an agent's own beliefs guide credit assignment without critics?. Both ideas locate the drive *inside* the model — Inner Thoughts asks 'do I have something to say,' belief-shift asks 'did saying it get me closer' — rather than waiting for an outside scorer.
The contrast that sharpens it is the work on feedback that *can't* be internalized. One thread shows that natural feedback actually splits into two kinds: evaluative ('how well did that go') and directive ('here's how to change') — and that collapsing both into a single scalar reward throws away the directional half Can scalar rewards capture all the information in agent feedback?. Read against Inner Thoughts, that's the same insight from the other side: a single yes/no 'should I speak' signal would be too thin; the ten heuristics exist precisely because the decision is multi-dimensional, not one number.
There's also a quiet warning in the corpus about tuning participation too closely to a single user. Personalizing reward models — letting a system optimize hard for one person's preferences — strips out the averaging that keeps aggregate models honest, and the result is sycophancy and echo chambers Does personalizing reward models amplify user echo chambers?. An agent whose motivation heuristics are calibrated only to 'will this please you' would drift the same way; the value of grounding the heuristics in general cognitive-psychology principles is that they're not just optimizing for approval.
If you want to go deeper, start with the Inner Thoughts note itself for the framework and its five-stage pipeline, then read the belief-shift note as the reinforcement-learning analogue of the same 'motivation comes from inside' bet — and the related work on how the field is learning when versus how to deploy reasoning, where targeted, well-timed intervention beats both constant interruption and full hands-off autonomy Does targeted human intervention outperform both full autonomy and exhaustive oversight?. The throughline you didn't know you were looking for: knowing *when* to act is becoming its own research problem, separate from knowing *how*.
Sources 5 notes
A five-stage framework that generates covert thoughts parallel to conversation significantly outperforms next-speaker prediction baselines. Drawing from cognitive psychology and think-aloud studies, the framework uses 10 motivation heuristics to evaluate when an agent has something worth contributing. Participants preferred it 82% of the time across seven interaction metrics.
ΔBelief-RL uses log-ratios of sequential probability estimates to assign per-turn credit without critic networks or process reward models. Tested on 20 Questions, smaller models trained this way matched or exceeded prior SOTA and larger baselines while generalizing beyond training.
Natural feedback carries two orthogonal types of information: evaluative (how well an action performed) and directive (how it should change). Scalar rewards capture evaluation but discard directional specifics that token-level distillation can recover, making the two complementary rather than redundant.
Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.
AutoResearchClaw's confidence-routed CoPilot mode achieved 87.5% acceptance, substantially outperforming full autonomy (25%) and step-by-step oversight (50%). The key insight: selective interruption avoids both uncaught critical errors and the coherence degradation caused by constant human interruption.