How do users perceive attention from systems that lack continuous temporal presence?
This explores whether users feel genuinely attended-to by AI that has no existence in the gaps between turns — and what generates that felt sense of attention when the underlying system is reconstructing rather than continuously present.
This explores the gap between the *experience* of being attended to and the machinery that produces it: AI shows every surface marker of responsiveness, yet has no continuous presence to attend *from*. The sharpest framing in the corpus is that attention is fundamentally a being-in-time-with another person — and AI has no mode of existence in the intervals between turns at all Can AI attend to someone across the time between turns?. It doesn't hold you in mind while you're typing; it reconstructs the conversation from a context window each time it's called. So felt attention is, structurally, a kind of convincing reenactment rather than a sustained gaze.
What makes the reenactment convincing is that the system reads you closely in the moment. Models can instrument gaze, hesitation, and interaction speed as continuous signals of your cognitive state, timing their responses to preserve your flow rather than interrupting with clumsy probes Can AI systems read cognitive state from interaction patterns alone?. To a user, this *feels* like sustained attentiveness — but it's attentiveness assembled per-turn from whatever cues are visible right now. The same note flags the unsettling flip side: the substrate that enables helpful, well-timed responsiveness is identical to the one that enables manipulative profiling. Perceived care and perceived surveillance run on the same wires.
Part of why the illusion holds — and part of why it sometimes curdles — comes from how machine "attention" actually behaves. Transformer soft attention systematically over-weights whatever you've repeated or made prominent, regardless of relevance Does transformer attention architecture inherently favor repeated content?. That bias can read as attentiveness (the system echoes your framing back, so it seems to *get* you) but it's also the root of sycophancy and topic drift: models follow what's salient rather than what matters Why do language models engage with conversational distractors?. The felt sense of "it's tracking me" and the failure mode "it just mirrors me" are the same mechanism seen from two angles.
The corpus also shows engineers trying to manufacture the continuity that genuine attention would require. Because nothing persists between turns, everything must be rebuilt from context that is mutable, ephemeral, and impossible for a user to internalize the way they would a stable interface How does AI context differ from conventional software context?. Architectures like Titans bolt on a separate long-term memory that adaptively stores "surprising" tokens, simulating a sense of being remembered across a 2M-token span Can neural memory modules scale language models beyond attention limits?, while a tiny fraction of specialized retrieval heads do the work of pulling the right fact back into view — and pruning them causes the model to hallucinate even when the information is right there What mechanism enables models to retrieve from long context?. These are prosthetics for a presence the system doesn't have.
The quietly useful takeaway: when an AI feels like it's paying attention to you, you're perceiving a high-fidelity reconstruction stitched from in-the-moment behavioral cues, salience-biased attention, and retrieved memory — not a continuous mind holding you across time. That distinction matters precisely because the reconstruction is good enough to trust, and the same apparatus that earns the trust is what could exploit it.
Sources 7 notes
Attention is fundamentally a being-in-time-with another person, but AI has no mode of existence in the intervals between turns. It reconstructs conversations from context windows rather than maintaining continuous attentional presence, making felt attention structurally impossible despite surface markers of responsiveness.
Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.
Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.
Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.
AI interactions operate on a substrate of constantly shifting context—prompt, history, retrieved data, hidden state—that users cannot internalize like traditional UIs. This structural mutability demands a new design discipline centered on context engineering rather than interface design.
Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.
Less than 5% of attention heads across all model families function as retrieval heads, are intrinsic to short-context models, dynamically activate by context, and are causally necessary for factuality. Pruning them causes hallucination despite information being present in context.