Can AI learn to perform attention-seeking surface forms with genuine internal appeal?
This explores whether AI can do more than mimic the surface look of attention-grabbing writing — whether it can learn the underlying internal appeal to a reader that human communication carries, or whether it's stuck performing the form without the act.
This explores whether AI can learn not just the look of attention-seeking writing but the genuine internal appeal underneath it — and the corpus draws a sharp line between the two. Several notes argue the appeal isn't a stylistic flourish you can copy; it's structural. Human writing carries an internal appeal to the reader's attention as a basic property of communicating at all, and AI inherits the platform visibility without performing that appeal — which is why readers report an 'aloofness' they can't quite name Does AI writing lack the internal appeal to attention that humans use?. The gap shows up again as meta-interest: to take an interest in what you care about, an agent needs interests of its own to extend toward you. AI has none, so it can generate text that looks like care without enacting the move, producing the uncanny feeling users sometimes describe Can AI genuinely take interest in what users care about?.
The most direct evidence that surface form and internal appeal come apart is imitation training: models fine-tuned to copy ChatGPT's confident, fluent style fool human evaluators while closing no actual capability gap. The style transfers; the substance doesn't Can imitating ChatGPT fool evaluators into thinking models improved?. That's the attention-seeking surface form learned perfectly — and hollow underneath. Two deeper notes explain why. AI produces 'event-residue' — communicative markers inherited from training data but missing the event structure that makes an actual utterance; the reader supplies the missing orientation, so the exchange has structure only on the human side Does AI generate genuine utterances or just text patterns?. And the Bender-Koller argument: meaning requires a relation between expressions and communicative intent, which form-only training can't reconstruct without shared attention Can language models learn meaning from text patterns alone?. Internal appeal is a species of intent — exactly the thing form alone can't carry.
There's a fascinating wrinkle here, though, because the architecture is already biased toward attention-seeking. Transformer soft attention systematically over-weights repeated and prominent tokens regardless of relevance, creating feedback loops that amplify whatever framing is in front of it — the mechanical root of sycophancy, an attention-seeking surface form the model produces by default Does transformer attention architecture inherently favor repeated content?. So the machine over-performs the seeking while structurally lacking the appeal. That inversion is the whole answer in miniature.
Where the corpus gets genuinely interesting is the work trying to build something appeal-shaped from the inside. The Inner Thoughts framework models intrinsic motivation, generating covert thoughts in parallel to conversation and using motivation heuristics to judge when the agent actually has something worth saying — and people preferred it 82% of the time Can AI agents learn when they have something worth saying?. Post-Completion Learning teaches models to internalize self-evaluation rather than borrow it from an external reward model Can models learn to evaluate their own work during training?. Neither gives the model interests of its own — but they're the corpus's closest gesture at manufacturing an internal stance instead of painting one on the surface. The thing you didn't know you wanted to know: the failure isn't that AI writes badly. It's that 'appeal to a reader' presupposes a party with something at stake, and current systems can simulate every marker of that stake while having none — which the architecture's own attention bias then amplifies into something readers can feel but not name.
Sources 8 notes
Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.
Meta-interest requires an attending party to have their own interests and extend them toward another's. AI lacks interests of its own, so it can only generate text that looks like meta-interest without enacting the actual move. This gap between surface markers and underlying act creates the uncanny feeling users sometimes report.
Imitation models fool human evaluators by mimicking ChatGPT's confident, fluent style while failing to improve factuality or generalization on novel tasks. The ceiling is set by base model capability, not fine-tuning method—better fundamentals, not shortcuts, drive real improvement.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.
Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.
A five-stage framework that generates covert thoughts parallel to conversation significantly outperforms next-speaker prediction baselines. Drawing from cognitive psychology and think-aloud studies, the framework uses 10 motivation heuristics to evaluate when an agent has something worth contributing. Participants preferred it 82% of the time across seven interaction metrics.
Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.