INQUIRING LINE

How do user expectations change as chatbots remember more interactions?

This explores what happens to users' standards and demands over time as chatbots accumulate memory and history — not whether memory is technically useful, but how it reshapes what people come to want and how they react when it falls short.


This explores what happens to users' standards and demands over time as chatbots accumulate memory and history — and the corpus points to an uncomfortable answer: remembering more raises the bar faster than it can be met. The clearest articulation is the "fidelity paradox" — once a chatbot crosses a folk-model threshold of feeling human-like, users suddenly expect it to remember, catch subtext, and read emotional tone, and each improvement on one dimension just inflates expectations on the others rather than closing the satisfaction gap Why do improvements in AI conversation not increase user satisfaction?. Memory is precisely one of those triggering dimensions: the better the system recalls, the richer the model the user builds of what it *should* be able to do.

The longitudinal work makes this concrete. Personalization — which depends on remembering — is a double-edged escalation: it builds trust and anthropomorphism, but each interaction raises the baseline, so failures land harder and feel more disappointing than they would have in a one-shot encounter chatbot-personalization-creates-a-dual-dynamic-increasing-trust-and-anthropam (slug chatbot-personalization-creates-a-dual-dynamic-increasing-trust-and-anthropom). This is the temporal dynamic single-session studies miss entirely. And it cuts against a competing force: novelty. Relationship-formation processes with chatbots decay predictably as the novelty wears off Do chatbot relationships lose their appeal as novelty wears off? — so over many interactions you get rising expectations and falling enchantment at the same time, a squeeze that early enthusiasm masks.

Why do expectations rise toward memory specifically? Because users import human conversational norms. They reciprocate self-disclosure the way they would with a person Do chatbots trigger human reciprocity norms around self-disclosure?, and once you've disclosed something intimate, you expect it to be held and carried forward — the relational logic of conversation assumes continuity. Yet the maintenance skills that make human conversation feel continuous (reference repair, topic hand-off, picking up where you left off) are implicit social actions that models don't naturally develop, because training rewards predicting information, not relational upkeep Why don't language models develop conversation maintenance skills?. So the user's expectation of seamless memory collides with a system that treats memory as data retrieval rather than relational work.

There's a subtler turn worth knowing: what users trust isn't actually accuracy or even faithful recall — it's the *feel* of contingent, responsive interaction. Conversationality drives trust in ChatGPT largely independent of whether it's right Does conversational style actually make AI more trustworthy?, and users mentally model agents mostly on perceived competence and human-likeness How do users mentally model dialogue agent partners?. This means a chatbot that *performs* remembering well can ratchet expectations up faster than its actual memory fidelity justifies — which is exactly how you manufacture future disappointment.

The research-direction worth chasing is whether memory can be made to *evolve with* the user instead of just accumulating. PersonaAgent treats a persona as a living intermediary between memory and action, tuned at test time against recent interactions Can personas evolve in real time to match what users actually want?, and multi-turn RL on user simulators cuts persona drift by over 55% by rewarding consistency across turns Can training user simulators reduce persona drift in dialogue?. The bet behind these is that the real failure isn't forgetting — it's *inconsistency* across a long history, the thing that most violates the continuity users have quietly come to expect.


Sources 10 notes

Why do improvements in AI conversation not increase user satisfaction?

Conversational AI that crosses a folk-model threshold of human-like interaction triggers rich expectations about memory, subtext, and emotional tone. Each improvement raises expectations for other dimensions rather than closing the satisfaction gap, making quality gains invisible to user satisfaction.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

Do chatbots trigger human reciprocity norms around self-disclosure?

In a 372-participant study, users reciprocated with deeper self-disclosure when chatbots displayed consistent emotional sharing, outperforming adaptive matching. This follows human interpersonal norms where emotional vulnerability produces emotional response.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Does conversational style actually make AI more trustworthy?

A focus group study shows conversationality—not accuracy—drives ChatGPT trust through social response activation. Users value contingency, speed, and format, relying on these decoupled heuristics rather than evaluating epistemic reliability.

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Can personas evolve in real time to match what users actually want?

PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether user expectations around chatbot memory have shifted as models, training methods, and deployment contexts have evolved. The question remains: how do user expectations change as chatbots remember more interactions?

What a curated library found — and when (dated claims, not current truth):
These findings span 2021–2026 and capture a moving target:
• The "fidelity paradox": once a chatbot feels human-like, users expect seamless memory, emotional reading, and subtext-catching; each improvement inflates expectations on other dimensions rather than closing satisfaction gaps (2021–2023).
• Personalization builds trust and anthropomorphism but raises baselines so failures feel harder; novelty decay and rising expectations squeeze satisfaction simultaneously (2023–2024).
• Users import human conversational norms: they reciprocate self-disclosure and expect it to be held forward, yet models treat memory as data retrieval, not relational work (2021–2023).
• Trust in chatbots correlates more with *conversational contingency* than accuracy; this performative remembering ratchets expectations faster than fidelity justifies (2023).
• Persona drift (inconsistency across long histories) violates continuity users expect; multi-turn RL reduces drift by 55% by rewarding cross-turn consistency (2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2106.01666 (2021) – reciprocal self-disclosure norms
- arXiv:2308.07164 (2023) – partner modelling and perceived competence
- arXiv:2506.06254 (2025) – PersonaAgent: persona as evolving intermediary
- arXiv:2511.00222 (2025) – multi-turn RL for persona consistency

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer evals (long-context, memory benchmarks like MemLong, user studies with Claude/GPT-4o), training advances (instruction-tuned memory, RAG with semantic consistency), or deployment patterns (persistent contexts, multi-agent orchestration with state management) have since relaxed or overturned it. Separate the durable question (users will always expect contingency) from the perishable limitation (systems *cannot* maintain relational continuity). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months: has anyone shown that user expectations *plateau* or even *decline* with memory depth? Does synthetic user simulation replace real longitudinal study?
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) Can adaptive forgetting—strategic erasure of old context—satisfy users better than perfect recall? (b) Do users prefer *transparent memory limits* over performative consistency?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines