INQUIRING LINE

Inquiring lines›What enables authentic and grounde…›How do tokenization and informatio…›How can conversational AI maintain…›this inquiring line

Text feedback tells an AI persona that it got something wrong, but quietly discards how it should change.

How does textual-only feedback limit what a persona can learn about users?

This explores a specific bottleneck: when a persona learns only from text-based signals (written feedback, dialogue transcripts), what kinds of user information slip through the cracks?

This explores how relying on text alone — written feedback, conversation logs, thumbs-up labels — bounds what a persona can actually pick up about a user. The corpus suggests the limit isn't mainly volume of text but the *type* of information text carries. The sharpest framing comes from work showing that natural feedback splits into two orthogonal channels: an evaluative one (how good was that response?) and a directive one (how should it change?). Scalar or coarse textual signals reliably capture the first and quietly discard the second, so a persona trained on them learns *that* it was wrong without learning *which way* to move Can scalar rewards capture all the information in agent feedback?. That missing directional content is exactly the part a persona needs to adapt rather than just score itself.

A second limit is sparsity. When the textual trace of a user is thin, it simply lacks the predictive power to anchor specific preferences — LLM judges built on sparse personas become unreliable, and the honest fix is to let the model abstain on low-confidence cases rather than force a guess Why do LLM judges fail at predicting sparse user preferences?. The implication: textual feedback degrades gracefully only if the system knows when it doesn't know. Active approaches push against this by *choosing* which text to elicit — asking a handful of maximally informative questions to pin down a user's reward coefficients, so ten well-targeted queries outperform a large pile of incidental text Can user preferences be learned from just ten questions?.

Text also flattens users in a way that costs accuracy. Several recommendation papers argue a single user is really *multiple* personas, and which one is active depends on the moment and the item in front of them; collapsing that into one textual profile loses the candidate-conditional structure that both improves predictions and explains them Can modeling multiple user personas improve recommendation accuracy? Can attention mechanisms reveal which user taste explains each recommendation?. A persona learning from undifferentiated text tends toward an averaged-out user rather than the situational one.

There's a deeper limit the corpus surfaces almost as a warning: text-only learning can fake competence. When one model voices all sides of an interaction, social simulations look fluent — but they collapse the moment agents hold private information the text never states, revealing that the model was skipping the grounding work real understanding requires Why do LLMs fail when simulating agents with private information?. Persona drift is the same gap seen over time: textual supervision rewards correct lines but never punishes contradictions, so consistency erodes unless you add an explicit penalty signal or invert the setup to train the simulator itself Why does supervised learning fail to enforce persona consistency? Can training user simulators reduce persona drift in dialogue?.

The through-line worth carrying away: text isn't a neutral pipe. It under-carries direction, thins out under sparsity, averages away a user's multiplicity, and lets models simulate grounding they never did. The most promising counters in the corpus don't add *more* text — they add a different channel: directive signals recovered at the token level, abstention under uncertainty, actively chosen questions, contradiction penalties, and personas that evolve through simulated interaction rather than passive reading Can personas evolve in real time to match what users actually want?.

Sources 9 notes

Can scalar rewards capture all the information in agent feedback?

Natural feedback carries two orthogonal types of information: evaluative (how well an action performed) and directive (how it should change). Scalar rewards capture evaluation but discard directional specifics that token-level distillation can recover, making the two complementary rather than redundant.

Why do LLM judges fail at predicting sparse user preferences?

Sparse persona information lacks predictive power for specific preferences, causing LLM judges to fail. Verbal uncertainty estimation recovers reliability above 80% on high-certainty samples by allowing abstention rather than forced judgment.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can modeling multiple user personas improve recommendation accuracy?

AMP-CF separates user representation into latent personas weighted by attention to the candidate item. This candidate-conditional approach improves accuracy by adapting the user representation at prediction time and produces inherent explanations for why items were recommended.

Can attention mechanisms reveal which user taste explains each recommendation?

AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.

Show all 9 sources

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Why does supervised learning fail to enforce persona consistency?

Supervised learning cannot enforce persona consistency because it rewards correct responses but never penalizes contradictions. Offline reinforcement learning combines inexpensive training on existing data with explicit contradiction rewards using human-annotated labels, offering a practical alternative to expensive online RL.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Can personas evolve in real time to match what users actually want?

PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. A curated library (2018–2026) examined this question: *How does textual-only feedback limit what a persona can learn about users?* Treat the findings below as dated claims—capability gains, new methods, or eval breakthroughs may have shifted the constraints. Your task is to re-ground the question.

What a curated library found — and when (dated claims, not current truth):
- Text-only feedback splits into evaluative ("was it good?") and directive ("which way to change?") channels; personas trained on scalar text learn *that* they were wrong but not *which way*, blocking adaptation (~2025).
- LLM judges built on sparse textual personas become unreliable; honest fix is abstention on low-confidence cases rather than forced guesses (~2024).
- Users have multiple situational personas, not one monolithic profile; collapsing them into one textual profile loses candidate-conditional structure and prediction accuracy (~2020–2024).
- Text-only social simulation can fake competence—models look fluent when omniscient, but collapse under real-world information asymmetry and private agent states (~2024).
- Persona consistency erodes without explicit contradiction penalties or active test-time evolution; multi-turn RL with contradiction-as-loss reduces drift by ~55% (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2406.11657 (Can LLM be a Personalized Judge?, 2024)
- arXiv:2503.06358 (Language Model Personalization via Reward Factorization, 2025)
- arXiv:2403.05020 (The Misleading Success of Simulating Social Interaction, 2024)
- arXiv:2511.00222 (Consistently Simulating Human Personas with Multi-Turn RL, 2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, ask: have newer LLM architectures, multimodal training, in-context learning, or test-time compute (e.g., chain-of-thought reasoning over persona attributes, adaptive sampling) since relaxed or overturned it? Separate the durable problem (likely still open) from the perishable limitation (possibly solved). Cite what solved it; flag where constraints still hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months—papers claiming text *plus* lightweight signals (e.g., click patterns, token-level reward models, or preference learning from interaction logs) restore persona fidelity. Does that work dissolve the original limits or merely patch them?
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., "If directive feedback can now be recovered from LLM reasoning traces at test time, does persona learning still require offline multi-turn RL, or can a smaller active-query loop suffice?" or "Does multi-agent evaluation (arXiv:2507.21028) circumvent the information-asymmetry problem by grounding persona reasoning in observable disagreement?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Text feedback tells an AI persona that it got something wrong, but quietly discards how it should change.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8