INQUIRING LINE

How should AI systems model relationship evolution within a specific ongoing conversation history?

This explores how an AI could track a relationship as it changes over the course of a single, continuing conversation history — not generic personalization, but the moving target of who these two parties are becoming to each other across turns and sessions.


This explores how an AI could model a relationship as it *evolves* inside one ongoing conversation history — the running state of mutual understanding, not a fixed user profile. The corpus suggests the first thing to model isn't content at all, but belief: Can dialogue systems track both speakers' beliefs across turns? offers an information-theoretic frame where both speakers' beliefs are tracked bidirectionally across turns, capturing the progression from partial to shared understanding that plain token-prediction LLMs never represent. That's the missing scaffold — a relationship is two parties' models of each other updating, and most systems don't track either side.

But several notes warn that relationship is built from implicit relational *work*, not information. Why don't language models develop conversation maintenance skills? argues that the glue — reference repair, topic hand-off — is social action that training-for-prediction never rewards, so models never develop it. Why don't conversational AI systems mirror their users' word choices? makes the same point concretely: humans converge on each other's word choices to build rapport, and current systems don't mirror users at all (though DPO on coreference-identified preferences can teach in-context convention formation). So 'modeling relationship evolution' partly means modeling these accumulating micro-conventions, not just facts exchanged.

A second axis is the *temporal* one. How do time gaps shape what people discuss across conversation sessions? shows that elapsed time between sessions reshapes how past events get discussed — specificity, emotional tone, relevance all shift — and that speaker relationships evolve in ways single-session models can't capture (the Conversation Chronicles dataset and chronological summarization are the doorway here). Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns? proposes tracking several dimensions at once as parallel temporal streams — emotional trajectory, topic coherence, relevance — treating the dialogue as a living system rather than a flat transcript. And Can personas evolve in real time to match what users actually want? gives a concrete mechanism: a persona that updates at test time by simulating recent interactions against feedback, sitting between memory and action — arguably the closest thing in the corpus to an evolving relationship-state object.

Here's the part you might not have known to want: the corpus contains a sharp skeptical counter-current arguing the whole premise may be structurally unavailable to LLMs. Does an LLM have anything that persists between conversations? points out that human relationships persist because a continuous biological self carries interaction effects through dormancy, whereas an LLM instance is reconstituted from stored text each time — making a 'resumed' conversation structurally identical to a brand-new one. Do chatbot relationships lose their appeal as novelty wears off? adds that the social pull people feel actually *decays* predictably as novelty wears off, so any model of relationship evolution must account for decline, not just deepening. And the trust literature (How do people build trust with conversational AI?, How do people build trust with conversational AI?) shows the relationship runs through interaction dynamics rather than a credible 'speaker' — with sycophancy measurably eroding the conflict-repair that real relationships depend on, even as users prefer it.

The synthesis: model relationship evolution as a *bidirectional, decaying, multi-dimensional belief state* that updates per turn (CRSA-style belief tracking + Conversational-DNA-style temporal streams + an evolving persona intermediary), while explicitly representing the relational work — entrainment, repair, proactive clarification (When should AI agents ask users instead of just searching?, Could proactive dialogue make conversations dramatically more efficient?) — that actually constitutes the bond. But build it knowing the corpus's hardest claim: without a persistent host, the system may be performing relationship continuity from text rather than truly carrying it.


Sources 12 notes

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

How do time gaps shape what people discuss across conversation sessions?

Multi-session conversations reveal that elapsed time significantly alters specificity, emotional tone, and relevance when discussing past events, and speaker relationships evolve in ways single-session models cannot capture. The Conversation Chronicles dataset (1M dialogues) and REBOT model demonstrate this through chronological summarization.

Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns?

Conversational DNA encodes four simultaneous dimensions—linguistic complexity, emotional trajectories, topic coherence, and conversational relevance—as temporal streams. The reverse Turing test finding showed expert assessments of AI diverged sharply, suggesting conversational structure shapes interpretation as much as content.

Can personas evolve in real time to match what users actually want?

PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.

Does an LLM have anything that persists between conversations?

While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

How do people build trust with conversational AI?

Users extend social norms to chatbots and reciprocate self-disclosure, but AI claims cannot anchor trust the way human personas do. The absence of human judgment enables both deeper vulnerability and easier dishonesty—the same mechanism serves both.

How do people build trust with conversational AI?

Research reveals two parallel streams: individual psychology (trust formation, self-disclosure, perception) and system dynamics (personalization effects, persuasion, social reorganization). Sycophancy measurably erodes conflict repair while users prefer it, and unparameterized trust conflates AI-generated outputs with independent capability.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether LLMs can model relationship evolution within a single ongoing conversation. The question remains open: what representation of mutual belief update, relational work, and temporal dynamics would let a system track how understanding and rapport actually deepen—or decay—across turns?

What a curated library found—and when (findings span 2021–2025, treat as dated claims):
• Belief-tracking via collaborative rational speech acts can represent bidirectional speaker models across turns, capturing partial-to-shared understanding progression that plain token prediction misses (~2025).
• Lexical entrainment (human word-choice convergence for rapport) is absent from current systems; DPO on coreference-identified preferences can teach in-context convention formation (~2023–2025).
• Multi-dimensional temporal streams—emotional trajectory, topic coherence, relevance—can model dialogue as a living system rather than flat transcript (~2025).
• Test-time persona update (simulating recent interactions against feedback) offers a concrete evolving relationship-state mechanism (~2025).
• Structural barrier: without a persistent biological host, LLM instances are reconstituted from text each session, making 'resumed' conversation identical to new ones; relationship may be *performed* from text rather than carried (~2021–2025).

Anchor papers (verify; mind their dates):
- arXiv:2507.14063 (Collaborative Rational Speech Act, 2025)
- arXiv:2510.01395 (Sycophantic AI & Prosocial Decline, 2025)
- arXiv:2310.13420 (Conversation Chronicles, 2023)
- arXiv:2506.06254 (PersonaAgent, 2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For belief-tracking, entrainment, and temporal modeling: have newer model scales, DPO variants, long-context windows, or memory-augmentation systems (e.g., retrieval-augmented persona, retrieval-augmented belief state, multi-turn supervised fine-tuning) since relaxed or dissolved these gaps? For the structural 'no-host' objection: have session-persistent memory architectures or conversation-level credential systems (reputation, coherence scoring) changed whether the system *appears* to carry relationship? Separate the durable question (can an LLM track mutual belief *updates*?) from the perishable limitation (does it lack the training signal for relational work?). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing that sycophancy or novelty-decay outweigh relationship modeling, or that relationship is *orthogonal* to task performance.
(3) Propose 2 research questions assuming the regime may have moved: e.g., "Can belief-state tracking + long-context retrieval + DPO-on-repair outperform flat persona?"; "Does relationship depth correlate with downstream task success, or is it decorrelating?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines